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Foreword 


Artificial intelligence (AI) is rapidly affecting more aspects of our lives thanks to signif- 
icant advancements in its research and the widespread usage of interactive goods. This 
is leading to the birth of several new social phenomena. 

Many nations have been working to comprehend these phenomena and discover 
solutions for moving artificial intelligence development in the proper direction to bene- 
fit individuals and communities at large. These initiatives necessitate multidisciplinary 
approaches that span not only the scientific fields involved in the creation of artifi- 
cial intelligence and human-computer interaction but also strong collaboration between 
researchers and practitioners. 

Because of this, the major objective of the MIDI conference is to combine two, 
up until recently distinct, areas of computer science research: artificial intelligence and 
human-technology interaction. Beginning in 2020, topics discussed at the MIDI confer- 
ence will include artificial intelligence-related challenges in addition to interface design 
and user experience. 

There is no denying that society is becoming more and more conscious of issues 
associated to the use of artificial intelligence in solutions. However, the development 
of artificial intelligence technology is advancing far more quickly than the search for 
solutions to the moral, social, and economic problems of the present. Both social research 
and AI technology expertise are required for the discussion of them to provide acceptable 
answers. As the conference’s organizers, we are certain that the expanded format will 
make it an even more engaging platform for experts in the fields of artificial intelligence 
and human-technology interaction to exchange experiences. 

As the conference hosted two Special Sessions, in 2022 the volume has been divided 
into four parts: (1) Machine Intelligence, (2) Digital Interaction, (3) Special Session: 
Advances in Collaborative Robotics, and (4) Special Session: Interacting with Virtual 
Reality Applications. As a result, the chapters deal with such topics as reinforcement 
learning, music prediction, medical usage of AI (prostate MRI analysis, chest scans 
analysis), co-designing immersive environments, gestural interaction with 3D data, 
collaborative robotics (i.e., electronic skin), and interaction with VR apps. 

We believe that all readers interested in emerging trends as well as creators of end- 
user IT products and services will find inspiration and useful theoretical and practical 
information in this book. The eventual success or failure of a newly generated product 
depends on the focus on meeting the requirements of people in a future where technical 
solutions based on artificial intelligence are developed by people for people. No matter 
how cutting-edge a technology solution may be, if it is not sufficiently connected to the 
lifestyle or other variables affecting the target users’ social behaviors, they will reject 
it. Underestimating the value of technology is a mistake, as evidenced by the history 
of technical progress and examples of technological solutions developed even by the 
biggest tycoons in the world. 
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During this year’s edition of the conference, two papers were awarded Best Paper 
Award in memory of professor Krzysztof Marasek. The professor was one of the initia- 
tors of the first MIDI conference focused on the user’s perspective in computer science 
and the co-organizer of subsequent editions. Being an outstanding scientist and engineer, 
an excellent specialist in the field of linguistics, voice user interfaces, and voice-based 
interaction, he understood the importance of research on technological solutions con- 
ducted from the user’s perspective. We hope that this year’s and subsequent editions 
of the MIDI conference will step up to the mark and continue the work of professor 
Krzysztof Marasek. 
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Abstract. Interdisciplinary research combines computer vision with 
stage light design to automatically detect light fixtures’ positions to cre- 
ate light animations. Multiple programmable light fixtures are often used 
in theaters, the event industry, and interactive installations. When cre- 
ating complex animations such as a wave traveling from one side to the 
other through multiple light fixtures array, all lights’ positions must be 
known beforehand. Traditionally the position of the light is marked in 
the technical plan. However, technicians make mistakes during the instal- 
lation and sometimes install the light in a different position. In such a 
case, time-consuming troubleshooting is needed to determine which light 
is misplaced and either correct the position in the software or manually 
move the light to the correct position. Our system saves time during 
installation and produces a light id and position pairs that users can use 
in various lighting control software. As a result, users can improvise and 
change the light positions more intuitively without needing a technical 
plan. Our system saves installation costs and enables rapid prototyping 
of light shows to create previously impossible organic designs. We ver- 
ified the system in a controlled experiment and measured the influence 
of camera resolution on accuracy. 


Keywords: DMX512 - light fixture - computer vision * position 
detection - camera 


1 Introduction 


1.1 Use Case 


We focus on large-scale dynamic light installations. Consider the light installa- 
tion at 131 South Dearborn, Chicago, the USA [22] as an example of an ideal 
use case. The installation consists of 925 glass bubbles, and each bubble has its 
light source. Install 925 individually programmable light sources and ensure that 
the wiring is right pose a considerable challenge and is prone to mistakes during 
installation. We aim to automate the light position mapping process with the 
proposed system. 

Another use case we tested was mapping multiple programmable led rings and 
strips. Instead of manually marking each fixture’s position, we have automated 
© The Author(s) 2023 
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the process. As a result, the setup can be changed frequently to help designers 
find the ideal configuration and immediately test it. Please watch the experiment 
video [15] documenting the process. 


1.2 Programming Light Show 


Without knowing the light positions, we can create only simple light shows, 
such as changing the light parameters uniformly or creating animations based 
on noise. To create a more complex light show, we need to distinguish individual 
lights and know their position. 

We can then create a simulation with light sources represented as points in 
space. See Figure 1. We add virtual objects and animate their position. We can 
achieve various light effects by detecting the collision of light sources’ positions 
with a volume of moving virtual objects. We turn on the light source when 
the light is inside the virtual object and turn it off when it is outside. We can 
also use multiple virtual objects to create more elaborate animations or map 
them interactively based on sensor inputs. For example, one can map people’s 
movement to light intensity in different sectors. 


= I © 
>, POINTS = LIGHT SOURCES # TURN ON LIGHTS THAT ARE INSIDE 
z | > VIRTUAL OBJECT 

4 


; VIRTUAL 3D OBJECT | 


: 
: e 


|| 4 DMX DATA OUT 
ANIMATED | a | £ 


Fig. 1. Schema of light scene animation: Light sources represented as points in space 
collide with the animated virtual 3D object. Lights inside the object will be turned on. 


1.3 Light Network Control 


Light fixtures are often controlled using DMX512 [10] protocol. Traditionally 
DMX address (a unique id of the light) is selected using a hardware DIP switch 
directly on the light fixture. Another option is to use Remote Device Manage- 
ment (RDM |[9]) commands from the control software [6]. However, RDM is only 
available with some DMX light fixtures, so we often rely on manual selection, 
making changing DMX addresses difficult. According to the technical plan, each 
light with the appropriate DMX address must be installed in the correct position. 

To control the lights, most often, we use the Artnet [2,22] or sACN [1,11] 
protocol - an extension of DMX standard that enables us to control more lights 
and use network topology and devices such as Ethernet switches to send DMX 
packets over the network. 
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In essence, we can control individual lights from a single networked computer. 
We need an ethernet adapter and Artnet/sACN node to convert the signal to 
DMX. DMX signal is then sent to DMX driver that maps the DMX values to 
voltage and current to dim individual lights. See Figure 2. 


Artnet-DMX-1 Moving head light 


Artnet to DMX RJ45 cable XLR-3 cable 
ee REZ >B-> é- ©--- 8 


Artnet-DMX-1 Par light 


RJ45 cable J cable XLR-3 cable 
»—— fap —— Na — > -> A 1.2... 
> 


Computer 
Artnet-DMX-1 LED light 
RJ45 cable GW XLR-3 cable A % % % 
—U gto — > > .%0000 
Le = = <= z 


Fig.2. Artnet light network diagram (https://www.ledlightinghut.com/artnet- 
dmx512-converter.html). 


2 State of the Art 


Various systems using a camera to control lights exist. Most of them deal with 
finding a user or controlling the light to track the user. In most cases, available 
solutions rely on knowing the light fixtures’ position in space. We are trying to 
solve a different problem, localizing light sources relative to the camera. Still, 
some principles can be used for both problems. 

Luxapose [14] system used a mobile phone as a camera and modified light fix- 
tures to track a mobile phone’s position. Light sources are modulated to produce 
pulses of light with encoded position and ID information that can be captured 
in a single frame exploiting the rolling shutter effect. While we are trying to 
determine the position of the light fixture, Luxapose is tracking a user. Theo- 
retically, we could repurpose the system and use the known camera position to 
determine the light position. Unfortunately, the Luxapose system requires light 
source modification which is not practical in our use case. Moreover, the camera 
position relative to the light source would have to be known. While feasible, it 
would require precise measurement on site. 

Similar to the Luxapose, a paper by Hossan and Tanvir [13] uses multiple 
known light fixtures to localize the camera position. We can not use triangulation 
methods to determine the camera position when using a single camera in our 
system. In practice finding the precise camera position relative to the light fixture 
would have to be repeated for all lights as we can not guarantee their position 
relative to each other. Such manual measurements would completely negate any 
benefits of automatization. The authors use found blob pixel count to measure 
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the distance from the camera to the light fixture. A similar approach could be 
adopted while moving the camera back and forth to determine the distance 
from the camera to the light source. Other approaches [24] use a photo-diode 
as a sensor installed in the light fixture to track objects based on changes in 
light propagation. Multiple light fixtures have to be taken into account, or a 
quadrant photo-diode is used [7]. The camera is no longer needed in these cases, 
but the light fixtures have to be modified to include the appropriate sensor 
and communication unit. For example, BlackTrax [8], a commercial solution for 
motorized lights, uses multiple cameras and an infrared beacon to track the 
moving target. Craig Hiller developed a system to detect light’s positions and 
even classify them as a fluorescent bulb, tube, or LED light [12]. The project 
relies on a Tango tablet [17] aligned with a DSLR camera and IMU in one 
package. The user walks under the lights to map their position and create a 
spatial map. Tracking is only possible with sensors being perpendicular to the 
light fixture, which is fine when detecting ceiling-mounted lights but would fail 
with vertical or organically shaped installations. The reported error is also not 
suitable for the light show scenario, with some lights being false positives and 
others missed during detection. 


3 Software 


3.1 Programming Environment 


The main program is written in Java Processing framework [4]. To acquire the 
camera stream, we used Gstreamer [23] based library [11]. To perform computer 
vision tasks, we have used the Java wrapper of OpenCV library [5]. 


3.2 Light Fixture Detection 


To find the light position, we create a frame difference between a frame with all 
lights turned off and a frame with a single light turned on. Then we threshold 
the resulting absolute difference to a binary image. We find the contours of the 
light blobs. Finally, we sort found blobs based on their area. We select the blob 
with the biggest area and find its centroid. The resulting x and y coordinates 
are saved and associated with the light DMX address. When we have processed 
all the lights, we normalize their coordinates and save all the information into 
a JSON file. Later this file can be loaded into light control software such as 
Touchdesigner [21], Madrix [3], OpenFrameWorks [20], Processing [4], VVVV 
[18], and similar software used to create light shows. 


3.3 GUI and User Input 


We have also developed a GUI for easier usage. We provide multiple options, 
such as selecting a target IP, setting custom binary thresholds, selecting from 
multiple cameras, and setting minimal and maximal blob size. Users can also 
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select whether to cycle through all lights automatically or one by one by clicking 
the button. The user can manually adjust every detected light position with a 
mouse if needed. Users can also create custom masks to select the area in the 
camera image that should be ignored during detection. Furthermore, we have 
also enabled perspective corner pin transformation to be applied to the camera 
image. Acquiring undistorted camera images is essential to measure uniform 
distances between light fixtures correctly. 


4 Experiment 


IC DMX LIGHT CALIBRATION - FPS: 60 - D X 
SETI CALIBARTE 


Press M to show/hide GUI, press L' to show/hide lights 
Ee BLOBHINPR 
LOG HEŁP 
== THRESHOLD 


WSUWA 


| UNIEODM2 UNI:0 DM; UNI: 0 DM! UNI: 0 DMX: 28 


me 
T poms UNIZO DM! UNI:0 DM UNI:0 DM) 


| UNEODMX'3 UNGO DMX 13 UNIO DM UNI: 0 DM) 
UNI: 0 DMX: 1 UNIO DM) UNIO DM) UNI: 0 DMX: 29 
UNIO OMX: 10 UNI: 0 DM) UNEO DM; 
UNI: 0 DMX: 4 UNI: 0 DMX: 19 UNI: 0 DMX: 27 
UNI: 0 DMX: 8 UNIO DMX. 14 
See UNI: 0 DMX:30 


UNI:0 DMX:11 UNI: O DMX: 17 


za 


Fig. 3. Experiment setup - LEDs mounted on the wooden plate 


4.1 Lights Setup 


We have tested our system on a setup with 30 individually programmable LED 
light sources controlled via the ArtNet network. Lights were installed on 250cm 
by 125cm wooden plate, all facing the same direction. See Figure 3. Each light 
is 3.5cm in diameter with 6 LEDs controlled as a single symmetrical light source. 


4.2 Camera 


We have used the Logitech C922 camera, a widely available and affordable stan- 
dard web camera with a USB interface. The camera was positioned perpendic- 
ular to the wooden plate 272cm away. The camera’s Diagonal Field of View 
(FOV) is 78°, horizontal FOV is 70.42°, vertical FOV is 43.3°, and Focal Length 
is 3.67mm. We tested the setup, once using 640*480px camera resolution and 
once 1920*1080px, so we could determine if the used resolution correlates with 
accuracy. 
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Table 1. Mean error in the position detection and standard deviation 


resolution pixel pitch error X |errorY |STDEV X|STDEV Y 
640*480 px |4.53mm  |2.912mm |2.426mm |0.678 px | 0.507 px 
1920*1080 px | 2.02mm  |2.885 mm | 1.947 mm | 1.069 px |1.318 px 


5 Results 


Experiment data is available at Zenodo 6814223 [16]. All installed lights were 
detected correctly. See the results in Table 1. In the case of 640*480px camera 
resolution, 1 pixel corresponded to 4.53mm in the wooden plate plane, and the 
average position error on the horizontal axis was 2.912mm and 2.426mm on the 
vertical axis. The maximum error on the horizontal axis was 2 pixels, and 1 pixel 
on the vertical axis with standard deviation 0.678px and 0.507px respectively. 

In the case of 1920*1080px camera resolution, 1 pixel corresponded to 
2.02mm in the wooden plate plane, and the average position error on the hori- 
zontal axis was 2.885mm and 1.947mm on the vertical axis. The maximum error 
on the horizontal axis was 4 pixels, and 6 pixels on the vertical axis with standard 
deviation 1.069px and 1.318px respectively. 


6 Discussion 


We assumed that the higher resolution of the camera and shorter distance to 
the lights would produce more accurate results as 1cm would be represented by 
more pixels in the camera image. Resolution does not play a crucial role, as we 
have not observed a significant accuracy increase when using 1920* 1080px over 
640* 480px. 

We need to know if particular light is left or right relative to the other light 
to propagate the animated wave through them correctly. We have successfully 
obtained correct relative relationships between lights irrespective of the used 
camera resolution. High absolute accuracy can be beneficial but unnecessary for 
the light-show creation. 

The light source shines directly at the camera and produces a lens flare. Lens 
flare does not worsen the detection as long as it is uniform in all directions, so 
we can assume that the light source is in the center. 


7 Conclusion 


We offer a robust way to automatically detect installed light fixtures positions 
with a single monocular camera. Furthermore, we save the information with the 
respective light DMX address in both machine and human-readable formats to 
be used in various light control software. Our proposed automatic light detection 
method can be thought of as a proof of concept with clear time and cost-saving 
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benefits. More importantly, it opens the way for new stage light design possibili- 
ties. More testing in real-world scenarios is needed to further verify the presented 
system’s viability. 


8 Limitations 


An unobstructed view of all light sources is needed to detect their position 
correctly. Position mapping is possible only if we find a view in which the light 
fixtures do not overlap. For example, it would be challenging to automatically 
get the positions of light fixtures organized in a 3D helix shape. 

The best use case for our system is the light sources positioned in a single 
layer. For example, lights are hanging from the ceiling. 

The problem occurs when lights are organized in multiple layers. Such as 
multiple lights on a single string or rod beside each other. It might not be 
practical to use our method in such a case. 

Reflections cause another limitation. When reflective surfaces surround the 
light sources, it can create several false-positive hot spots in-camera images that 
might be hard to distinguish from the actual light source. For example, a chrome- 
plated ceiling has high reflectivity. In this case, reflections have to be eliminated 
before detection. 


9 Future Research 


We can improve the tool to enable 3D position mapping. We can use the binoc- 
ular stereoscopic camera to calculate the distance from the camera to the light. 
Alternatively, we can reposition a single monocular camera to acquire two points 
of view to achieve the same result. We could further improve the usability by 
merging multiple cameras to cover a larger area to map large-scale installations. 
Another approach would be to enable sequential mapping. After mapping one 
place, the user would physically move the camera to cover another neighboring 
area. The relative position change of camera origin could be calculated by the 
standard SLAM method [19]. 
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Abstract. Intelligent Traffic Monitoring and Management System (TMMS) is a 
growing research area as cities infrastructure continues to evolve. Traffic situation 
is demanding innovative solutions for effective monitoring and management given 
the complex nature of the urban scenario. A major focus of this research domain is 
fine-grained vehicles classification that requires detection and recognition of dis- 
tinct features of vehicles. Some of these features are semantic based while others 
are appearance based. One such appearance-based feature of a vehicle is its logo. 
Logo detection helps with identification of a vehicle’s make during fine-grained 
classification process. There are various deep learning methods which give good 
performance for such object detection tasks. However, it is challenging to exploit 
these methods due to smaller size of logo especially in a surveillance environment. 
This work firstly presents a deep learning-based approach for detection of vehicles’ 
logos in camera video feeds. Due to small size of logos, a unique pipeline using 
three different deep learning models is designed. Firstly, a modified Improved 
Warped Planar Object Detection Network (IWPOD-NET) selects a Region of 
Interest (ROT) and adjusts the orientation of vehicle logo. Then YOLO (You Only 
Look Once) v5 is used to detect the logo part in the selected ROI and finally, 
EfficientNet is used to further classify logo into different classes. This pipeline 
is tested on four surveillance environments namely toll control, law enforcement, 
dashcam, and parking lot access control. Comparative analysis shows accuracy 
improvement with this proposed approach in each testing case. A pose variance 
analysis is also performed to determine the orientation limits to which this app- 
roach can work. Secondly, a custom dataset, VL-10 (Vehicle Logos) is presented 
which provided further insights into the challenges w.r.t local environment settings. 
The whole approach improved the overall performance of the logo detection and 
recognition system. 


Keywords: Video surveillance - Vehicle detection - Vehicle logo detection - 
Fine-grained vehicle logo classification - Deep learning - IWPOD-NET - 
EfficientNet - YOLOv$ - Image unwarping 
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1 Introduction 


Intelligent Traffic Monitoring and Management System (TMMS) has an increasing sig- 
nificance as the traffic infrastructure is advancing. With the advent of technology, intelli- 
gent TMMS has taken over conventional TMMS and has become an emerging research 
area. It includes detection, recognition and fine-grained classification of vehicles, auto- 
matic license plate recognition and various kinds of traffic analytics. Its applications 
vary from commercial domains to national security levels e.g., surveillance, security sys- 
tem, traffic congestion avoidance, accident prevention, advanced driver assistance sys- 
tems, access control systems, intelligent parking systems, electronic toll collection, etc. 
These applications necessitate the development of more robust techniques for Intelligent 
TMMS. 

The fine-grained classification of vehicles is the process of categorizing a vehicle 
into its make and model, generally known as Vehicles Make and Model Recognition 
(VMMR). Many techniques are used for this classification; some by considering the 
whole vehicle or others through a certain area of vehicles (e.g., frontal view, grill, or 
logo). A vehicle logo is a distinct appearance-based feature which is an important part of 
this whole process. Vehicle logo detection/recognition is required for many applications 
of TMMS. It can be used for estimation of brand reputation and traffic monitoring etc. 
Its recognition also plays an important role in the scenarios where the authenticity of 
vehicle’s make is doubtful. For example, the scenario where a stolen vehicle’s license 
plate is replaced with another one, or when a license plate is obscured or illegible, 
which affects the readability process in any license plate recognition application. The 
logo detection can be used as an authenticity validation step for this purpose. However, 
identification through vehicle logo is a challenging task due to its smaller size. Detection 
and recognition of small objects is a fundamental challenge in Computer Vision and is a 
growing research area these days. Many researchers have achieved high performance for 
small objects detection using Convolutional Neural Networks (CNNs). CNN is able to 
learn various image-based features from large-scale data without the need of any human 
intervention. Compared to different kinds of small objects, the vehicle logo has more 
complex rich content information at times due to their unusual designs. These small 
sized logos are also affected by background clutter and noise. Similarly, most of the 
times, logo is blurred and not properly illuminated, and even not visible due to rainy or 
snowy weather which causes difficulties in features identification. An overview of some 
challenges associated with this logo detection and recognition task are provided below: 


e Logo Size and Resolution. Logos account for only ~1% of the frame obtained from 
camera. Due to camera distance, the resolution is low at times. These issues affect 
the feature extraction process. 

e Illumination Variance. Colors may vary under different illumination conditions espe- 
cially in low-light conditions, where identified features may not be that effective. 
Similarly, vehicle logos usually have high reflectance luminous which causes issues 
in features extraction. 

e Logo Location. The locations for vehicle logos vary among various vehicle manufac- 
turers. Some manufacturers place their vehicle logos on radiator grilles, while some 
place them on the vehicle’s front hood. This causes difficulties in the identification 
of logo boundaries. 
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e Background Clutter and Noise. Logo is a comparatively smaller part of vehicle due 
to which there are many sources of background interference including radiator grille, 
bumper and other small objects. These objects distract the feature extraction and 
results in increase of false detections. 


This research has modified a state-of-the-art license plate detection model IWPOD- 
NET, to extract the desired ROI and then further unwarp it to fix the distortions in 
varying angles [32]. The used surveillance recordings are low in resolution and cause 
the accuracy to decrease. Therefore, a combination of different CNN models are used 
for detection and recognition purposes. The results show a significant improvement in 
overall detection and recognition of vehicle logos. The main contributions of this paper 
are highlighted as followed. 


e An offline vehicle logo detection and recognition approach that uses video feeds 
acquired through low-resolution surveillance cameras for authenticity validation of 
vehicles. 

e Improved logo detection and classification based on a modified state-of-the-art license 
plate detection network. 

e Pose variance analysis to identify the varying angles on which the proposed approach 
can effectively work. 

e Acustom vehicle logo dataset, VL-10, containing logos of 10 commonly used vehicle 
makes and models. 


The rest of the paper is organized as followed: Sect. 2 provides a concise overview 
of the related research work. Section 3 provides a detailed working methodology of the 
proposed approaches and gives an overview of the proposed dataset. Section 4 discusses 
the test cases used to conduct this research and presents a comparative analysis of the 
results obtained from the experimentation. Finally, Sect. 5 concludes the papers and 
provides a future direction of this research. 


2 Related Works 


Work in different domains of intelligent traffic monitoring and management system has 
been contributed by many researchers. Setchell discussed monitoring and management 
of road traffic using computer vision in details [1]. The paper presented two vision-based 
traffic monitoring systems including license plate recognition system and a road traffic 
monitoring system to track vehicles. Different ideas for vision-based traffic monitoring 
and management systems have been proposed [2-6]. VMMR is also an important part 
of traffic monitoring and management systems. 

There have been many related approaches using conventional algorithms such as 
Scale-invariant Feature Transform (SIFT), Histogram of Oriented Gradients (HOG) and 
Sequential Pattern Mining (SPM) etc. [7-9]. Usually, logo is detected through the ref- 
erence of license plate [10-12]. After the detection of license plate, different techniques 
are then implemented to segment the logo [13-15]. Psyllos et al. introduced an enhanced 
SIFT-based features extraction mechanism for vehicle logo recognition. The experiment 
results showed a significant improvement of recognition accuracy compared to conven- 
tional SIFT algorithm [16]. Mao et al. proposed a novel vehicle logo detection method 
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which highlighted logo regions using direction filters and saliency map. Information 
entropy was then used to choose the binary image and for precise location of logo region 
[17]. Yu et al. proposed a system for vehicle logo recognition based on Bag-of-Words 
(BoW) [18]. The vehicle logo images were represented as histograms of visual words 
and were classified by SVM in three steps. These steps involved SIFT feature extraction, 
quantization of features into visual words and histograms of visual words with spatial 
information. Wang ef al. presented a vehicle logo detection/recognition system which 
uses edge features to find logo in the rough logo region detected through prior knowl- 
edge [19]. Then, a combination of template matching and HOG were used to identify 
the category of logo. 

These conventional techniques are usually dependent on hand-crafted features, which 
lack robustness in certain conditions. These conventional algorithms are relatively easier 
to implement, however, these algorithms are computationally expensive. Compared to 
these conventional approaches, various deep learning-based methods are also explored 
by researchers. The deep learning-based algorithms have significantly better perfor- 
mances in detection of objects under different complex environments as discussed in 
Sect. 1. Tong. K et al. presented an in-depth analysis of different deep learning-based 
small objects detection approaches [20]. The authors showed with experimentation that 
techniques such as multi-scale feature learning, data augmentation and different training 
strategies can significantly improve the small objects detection. Similarly, these state- 
of-the-art techniques have also been tried and tested for the task of small objects such 
as vehicle logo. 

Yang et al. proposed a modified YOLOv3 model for vehicle logo detection in com- 
plex scenes [21]. The approach showed good accuracy on the proposed vehicle logo 
detection dataset. Zhang et al. proposed a real-time lightweight vehicle logo detection 
system based on deep convolutional networks [22]. The experimentation showed a sig- 
nificant improvement in accuracy of vehicle logo detection task. Jiang et al. proposed 
an improved YOLOv4 model which enhanced the backbone feature extraction network 
for efficient extraction of small vehicle logos [23]. It further used convolutional trans- 
former to reduce influence of complex backgrounds. Yang et al. proposed an improved 
YOLOVv2 network which provided fast and accurate vehicle logo detection [24]. The 
experimental results show that this approach significantly improved accuracy and speed 
of logo detection. 

The surveillance recordings usually have low resolutions which makes the logo 
detection challenging, particularly at varying angles. Various techniques are used to 
improve the quality of video prior to the detection of objects. Some researchers used 
Super Resolution algorithms to overcome these limitations and to convert these low- 
resolution images to high-resolution ones [25-29]. Others used near-field and far-field 
video enhancement techniques [30]. Some researchers suggested the use of ROI rate 
control scheme for surveillance videos [31]. However, this proposed research work 
adjusts the orientation of logos at varying angles, which improves the overall vehicle 
detection and recognition task. 
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3 Working Methodology 


3.1 Selected Models 


A unique pipeline using a combination of different deep learning models is designed 
in this study for improved vehicle detection and recognition. The specific details of the 
selected models are as follows: 


Unwarping Model (IWPOD-NET). Improved Warped Planar Object Detection Network 
(IWPOD-NET) [32] is able to detect four corners of vehicle license plate in a variety of 
unconstrained scenarios. It then unwarps the license plate to a fronto-parallel view and 
eliminates perspective related issues. This network was introduced to tackle license plate 
detection with varying viewpoints. However, this research modified the model to serve as 
basis for ROI localization and orientation adjustment of logos. The localized area above 
the license plate was extended by a variable (assuming that the logo placement is always 
on the top of license plate) to include the logo part as well in complete ROI, which was 
then unwarped. For example, if the height of license plate is 3 units and variable is set to 
be 2 units, then the ROI will be extended by 2 x 3 units vertically upwards. Workflows 
for both original and modified network are shown in the following Fig. 1. The characters 
on the license plate of the vehicles utilized in this study have been concealed in order to 
maintain the privacy of the vehicle owners. 


Oriented License License Plate License Plate License Plate 
Plate Detection Extraction Unwarping Recognition 


(a) 
Oriented License Oriented ROI Oriented ROI 
Plate Detection Selection Extraction 
j 
(b) 


Fig. 1. Workflows for both the Original IWPOD-NET (a) and Modified IWPOD-NET (b) models 


Logo Detection Model (YOLOv5m). Advent of YOLO models improved the real-time 
object detection. Many versions of YOLO have been introduced so far, which have sig- 
nificantly improved object detection. This research utilizes YOLOVS, which has different 
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architecture series including P5 and P6. These series are further divided into nano (n), 
small (s), medium (m), large (1) and extra-large (x) that vary on the basis of parameter 
sizes and provide optimizations for particular image sizes. For example, P5 is optimized 
for training images of sizes 640 x 640. For the desired task, YOLO-v5-P5 medium sized 
model is used in this system of logo detection. Due to low classification accuracies, the 
model was used to detect the presence of logo only. For a comparative analysis, results 
are also presented in Table 2 for the approach when YOLO was used for both detec- 
tion and classification purpose. The unwarped image from previous step is then fed to 
YOLO-v5m model for the detection of logo. The workflow is shown in following Fig. 2. 
As shown in the figure, logo is successfully extracted from the complete ROI. 


Unwarped ROI Logo Detection Logo Extraction 
— - 


Fig. 2. Logo Detection using YOLO-v5m 


Logo Classification Model (EfficientNet). EfficientNet [29] introduced by Google is a 
commonly used CNN architecture for image classification purposes. The model achieves 
state-of-the-art performance, in addition to being faster and lightweight. This research 
work trained EfficientNet on logos in the proposed dataset. Firstly, all images were 
cropped to the logo part, based on dimensions from the annotated images that were fed 
to the object detection model. Then these cropped logo images were used for training 
the classification model. In testing phase, once the logo is detected in ROI, it is then 
fed to the classification model as an input. It then classifies the logo into its respective 
class. The addition of EfficientNet has significantly improved the results as discussed in 
Sect. 4. 


3.2 Dataset 


A high-quality dataset is essentially required for tackling various computer vision tasks 
involving object detection and recognition etc. The lack of a local dataset can become 
a huge challenge when performing these tasks for any particular region with a specific 
complex traffic environment. To the best of our knowledge, there is no standard local 
dataset available. Therefore, a local dataset of vehicles, VL-10, is developed where logos 
are either prominently visible or partly visible due to varying angles as in surveillance 
environment as shown in Fig. 3. 
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All the logos are self-collected from various local and online sources (e.g., different 
car selling websites with publicly available images). The logos are categorized into ten 
different vehicle classes, including Honda, Suzuki, Toyota, Kia, Hyundai, Mitsubishi, 
Daihatsu, Faw, Hino and Nissan. A total of 500 and 50 images per class were selected 
to form the training and validation sets, respectively. The data augmentation was then 
carried out by adding blur and noise to each image. Therefore, the total images per class 
became 1500 and 150 for training and validation sets, respectively. 


Fig. 3. Some Images from VL-10 Dataset 


4 Results and Discussions 


4.1 Test Cases 


Different surveillance scenarios have been chosen for test/evaluation of the proposed 
model pipeline. These include video streams from toll control (i.e., CCTV video from 
height), law enforcement (road-level cameras), parking lot access validation (cameras 
installed at various entry/exit points), and dashboard cameras. The highlights of each test 
case are provided in Table 1. Similarly, example frames from each test case are shown 
in Fig. 4. 


4.2 Comparative Analysis 


Three different approaches were tested, and their comparative analysis is accordingly 
presented in this section. 


Approach 1 - Without IWPOD-NET Using  YOLOv5 for Both 
Detection and Classification. The YOLOv5 model was trained on VL-10 dataset for 
both object detection and classification tasks. All test scenarios were first passed to this 
model to get baseline accuracies. The mAP for detection and classification is presented 
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in Table 2 for each test case. IWPOD-NET was not used in this approach so that results 
could be compared, and the impact of IWPOD-NET could be observed. 


Approach 2 - With IWPOD-NET Using YOLOv5 for Both Detection 
and Classification. A modified IWPOD-NET was introduced. The detected vehicles 
were passed to IWPOD-NET for the ROI extraction and unwarping. After extraction of 
ROI, it is then passed to the YOLOv5 model trained on VL-10 dataset. Here, YOLOv5 
detects and classifies the logos into their respective classes. Results obtained in the form 
of mAP are shown in Table 2. It can be seen that results were improved to a certain level. 
The highest improvement shown is in the Law enforcement test scenario. This is due to 
the fact that this test set has side camera viewpoints and IWPOD-NET helped to unwarp 
these side views really well. On the other hand, the lowest improvement is shown by 
parking lot access validation test set, which is apparently because camera viewpoint was 
already frontal, and the license plates were not very oriented in this case. 


Approach 3 - With IWPOD-NET Using YOLOv5 for Detection and EfficientNet 
for Classification. In the third approach, while using IWPOD-NET and YOLOv5, 
another model, EfficientNet was introduced. YOLOv5 was trained to detect the pres- 
ence of logo only. All the classes of VL-10 dataset were trained on EfficientNet which 
was then used for logo classification. Firstly, each test dataset was passed to modified 
IWPOD-NET for the extraction and unwarping of ROI. The ROI was then passed to 
YOLOv5 model for the detection of logo. After the logo was detected, it was then 
passed to the EfficientNet for classification of the detected logo. The mAP for the detec- 
tion by YOLOv5 and the accuracy of classification by EfficientNet is shown in Table 2. 
Considerable improvements in the results were observed. The highest improvement in 
the detection were observed in the Dashcam testing case. The margin left was due to 
the motion blur present in this case since both the objects (vehicle/logo) and the camera 
were in motion. Highest classification accuracy was produced in the case of parking lot 
access validation. The margin left in this case was due to the inclusion of night images in 
dataset where the logos were not perfectly visible, and the use of streetlights create the 
issue of reflection. Approach 3 stood out from the other two approaches. Therefore, this 
pipeline could be used as an offline tool for logo detection and classification in various 
surveillance environments. 


4.3 Pose Variation Calculation 


Pose variation analysis was conducted to determine the capacity of the proposed model 
to detect logos accurately at varying angles. For this purpose, some images were captured 
in the daylight similar to the settings in parking lot access validation dataset. Different 
angles of each vehicle were captured as shown in Fig. 5(a). Poses were categorized by 
comparing the oriented bounding boxes and rectangular bounding boxes of license plate 
as shown in Fig. 5(b). IWPOD-NET detected the four corners of license plate (green 
dots). It then made an oriented bounding box joining these four corner points (blue 
bounding box). Another algorithm was used to make rectangular bounding boxes (red 
bounding boxes). The upper lines for both bounding boxes were used to identify the 
poses. This was done by calculating the angle “6” subtended between both lines. 
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According to the observations, the proposed module was able to detect logo in the 
range of 0-30° while it was mostly undetectable beyond this range. In the range 0— 
10°, the module’s logo detection mean Average Precision (mAP) was 0.91 while the 
classification accuracy was 0.97. In the range 10—20°, the module’s logo detection mAP 
was 0.85 while the classification accuracy was 0.90. On the other hand, in the range 20— 
30°, the module’s logo detection mAP was 0.80 while the classification accuracy dropped 
to 0.75. Angles from 0—40° are depicted in Fig. 6 along with the visual appearance of 


logos. 
Table 1. Highlights of video data for different test cases 
Toll Control Law Enforcement Dashcam Parking Lot Access 
Validation 
Sequence Type | Outdoor Outdoor Outdoor Outdoor 
Environment Sunny Sunny Day (Sunny), | Day (Sunny), Night 
Night 
Object Class Logo Logo Logo Logo 
Object Size Small Small to Medium | Medium to Medium 
Small 
Object Speed Medium to Fast | Medium to Fast | Medium to Slow 
Fast 
Camera View Top frontal and | Side Frontal and | Frontal and Frontal 
back Side back Back 
Camera Motion | Static In motion In Motion Static 
Resolution 3840 x 2160 1920 x 1080 1920 x 1080 | 1920 x 1080 
Noise Level Low Medium to high | High (motion | Low 
(motion blur) blur) 


Table 2. Results for each testcase using all the three approaches 


Test Case Approach 1 Approach 2 Approach 3 
Detector Classifier 
mAP mAP mAP Accuracy 
Toll Control 0.470 0.589 0.73 0.77 
Law Enforcement 0.322 0.453 0.624 0.64 
Dashcam 0.391 0.486 0.685 0.71 
Parking Lot Access Validation 0.528 0.61 0.79 0.82 
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Dashcam Law Enforcement Toll Control 


Parking Lot Access 
Validation 


(a) (b) (c) (d) 


Fig. 4. (a) Single frame from each test case; (b)-(d) Individual vehicles in respective datasets of 
each test case along with their original logo containing ROIs and their respective unwarped ROIs 
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(a) (b) 


Fig. 5. (a): Photos capturing from different angles; (b): angle calculation for pose variance analysis 


40° 


Fig. 6. Pose variance analysis on different angles along with respective logo appearance 


5 Conclusion and Future Work 


Intelligent TMMS is grasping researchers’ attention due to the evolving infrastructure 
of cities. This is not only becoming complex day-by-day but also demanding more 
innovative solutions. Vehicle classification is now shifting towards more fine-grained 
classification. For this, all such features of vehicles are being considered which are 
distinct and specific. One such feature is the vehicle’s logo. Though, it stands out as 
a unique feature depicting the information regarding vehicle make/model, yet its small 
size is a complex challenge for its detection and recognition. The detection of small sized 
objects like a logo in a surveillance environment where vehicles themselves appear to be 
very small is nearly impossible. Video enhancement techniques increase computational 
cost and time. This paper specifically targeted surveillance environment and presented 
a novel method of detection and recognition of vehicle’s logo. A modified IWPOD- 
NET was introduced to overcome the orientation issues. Then YOLOv5 was used to 
detect logos and EfficientNet was used to classify logos. This pipeline was tested on 
four surveillance environments namely toll control, law enforcement, dashcam, and 
parking lot access control. Comparative analysis showed accuracy improvement with 
this proposed approach in each testing case. Pose variance analysis was also performed 
to determine the orientation limits to which this approach can work. It was observed that 
the proposed approach worked best in the 0-30° range, while it was mostly undetectable 
beyond this range. Secondly, a new dataset, VL-10 was presented which focused on 
local environment settings. The proposed technique in this research work improved the 
results for logo detection and can therefore be used in offline surveillance environment. 
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As part of a future work, the deep learning architecture can be optimized to detect 


and recognize vehicle logos in real-time. Similarly, a completely new deep learning 
architecture can also be designed particularly for this vehicle logo task. 
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Abstract. The intricate temporally prolonged sequences seen in music 
make it a perfect environment for the study of prediction. Melody, har- 
mony, and rhythm are three examples of the structural elements found 
in music. This research incorporates music excerpts prediction by under- 
standing structural details using Markov chain and LSTM models. The 
novel approach compares to state-of-the-art algorithms by predicting how 
a musical excerpt would continue after being given as input. To com- 
pare the variations in prediction and learning, different learning models 
with different input feature representations were utilized. This algorithm 
envisions multitude of usage including next generation music recom- 
mendation system using intra-sequence matching, pitch-tone correction, 
amongst others by integrating with recent advances in deep learning, 
computer vision, and speech techniques. 


Keywords: music excerpt - markov chain - LSTM - sequence 
prediction 


1 Introduction 


Even though the western music scale only has twelve notes per octave, it encom- 
passes the vast majority of the music one is familiar with, from Bach and 
Beethoven to Elton John and Beyonce. The past ten years have seen an increase 
in interest and a diversity of methods for algorithmically creating music using a 
variety of learning models [1]. Can deep learning models be used to study and 
utilize the genius of Mozart or Beethoven? How simple is it to recreate particular 
styles of music that the generation is accustomed to hearing? 

While music has attracted a lot of attention, research on music prediction has 
received less attention. In this research, using a musical snippet, predictions are 
made for following N musical occurrences (for instance, 10 quarter-note beats). 
The actual musical occurrences are then contrasted with the predicted music, 
and the prediction is graded using a scoring system. Experiments leaves one 
wondering on whether music prediction or music generation is trickier. 


2 Data 


The Patterns for Prediction Development Dataset, was used. This data was 
produced by analysing a randomly chosen portion of the Lakh MIDI Dataset!, a 


1 https: //www.music-ir.org/mirex/wiki/2020:Patterns_for_Prediction. 


© The Author(s) 2023 
C. Biele et al. (Eds.): MIDI 2022, LNNS 710, pp. 26-34, 2023. 
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dataset made up of one million popular music tracks. For both monophonic and 
polyphonic songs, the symbolic MIDI format was applied. Each input, or prime, 
corresponds to about 35s of music, and each output, or continuation, includes 
the following 10 quarter-note beats. 


3 Feature Representation 


3.1 Basic Music Notation 


The fundamental component of music and the foundation from which all chords 
and melodies are built are notes. Each note has a pitch and a duration. Pitch is 
essentially the note’s sound frequency. The MIDI music file system can store 128 
different pitches. Quarter-note lengths are used to measure duration, which is 
the length of the note. When several notes are played simultaneously, they form 
chords. Since only one note is played at a time, monophonic music lacks chord 
structure. Polyphonic music has chords. 


3.2 Note and Chord Representation 


Each note was encoded as an integer ranging from 0 to 128 due to the fact that 
notes have 128 pitches in the MIDI file system. The number 129 was used to 
symbolize a rest, or the amount of time between notes. In string format, a chord 
was shown as a collection of pitches. For instance, “68.70.75” might be used 
to denote the chord with the pitches 68, 70, and 75. The training set’s chords 
were all sampled, and a dictionary was built that converted the uncommon 
pairings into integer values. This corresponded to 101,039 distinct combinations 
and consequently input data dimensions for the medium-sized dataset. 

How to handle inputs (chords) that are present in the testing set but absent 
from the training set is one of the difficulties in music prediction. This issue 
is avoided while creating music since the model only generates chords that are 
present in the training set. There will invariably be note combinations in the 
testing data that the model hasn’t “seen” previously while making a forecast. 
The initial strategy was replacing the unfamiliar chord with the most prevalent 
chord in the practice set. 

A more sophisticated approach is to substitute the most frequent chord found 
inside the prime sequence for the unknown chord as the next sequence item. 
Combinatorial proliferation of chord possibilities in polyphonic music makes the 
process a bit challenging. 


Duration Representation: The representation of the durations is a quarter- 
note length. The number of durations may be insurmountable, in principle. The 
duration length is typically spread among a limited set of choices. All of the 
training data’s durations were sampled, and a dictionary was built that con- 
verted each distinct duration length to an integer. The dictionary’s length varied 
depending on the dataset, although it was often less than 100 words long. 
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Combination Representation: The combination of note pitch and durations 
as a tuple was represented initially as a Pre-process step. For instance, the tuple 
was used to represent a note with a pitch of 60 and a duration of 1 quarter-note 
length (60, 1). Each distinct tuple was mapped to an integer value, just how the 
chords and durations were done. 


Interval Representation: Note intervals were used in the final input repre- 
sentation that was assessed. Pitch difference between two nodes is an interval. 
Each interval denotes a semitone of difference between the pitches of two notes 
that are right adjacent to each other. For instance, the pitch difference between 
the notes 60 and 65 is 5 semitones. Negative intervals are used to illustrate drops 
in pitch. There are 256 total possible interval values because there are 128 total 
pitches. Instead of using pitch sequences for training and prediction, interval 
sequences were used for this representation. For instance, the sequence of inter- 
vals for a prime consisting of the four notes 65, 70, 45, and 60 would be 5, -25, 
15. To find the note predictions using the list of interval changes, the last pitch 
of the prime was preserved in a list and utilized as the beginning point. 


4 Methods and Modeling 


4.1 Markov Chain Model 


One of the earliest models utilized for music production was the Markov chain. A 
first-order and a second-order Markov chain was trained and evaluated. Separate 
training sessions were held for notes and durations. This model could be utilized 
with inference methods like restricted inference and beam search [2-4]. 


4.2 LSTM Model 


A Forest of Long-Short-Term Memory (LSTM) network was used for next set of 
experiments. A single network with multivariate input made up one model (both 
notes and durations). For the notes and durations, respectively, there were two 
different LSTM networks in the other model. It was established that notes and 
durations have a significant relationship. Input layer, 64-dimensional embedding 
layer, 512-hidden unit LSTM layer, batch normalization layer, and dropout layer 
with a dropout rate of 0.3 were the layers present in both models. The multi- 
variate model then has two dense layers employing a categorical cross-entropy 
loss and an rmsprop optimizer for gradient descent after concatenating the two 
inputs. The batch size was 64, and there were typically 100 epochs. Experiments 
were performed using more epochs, but the outcomes didn’t change, over-fitting 
was also observed in some cases. Figure 1 below displays the multivariate LSTM 
model’s graph. 
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Fig. 1. LSTM Multivariate Model 


4.3 LSTM Encoder-Decoder Model 


Two different models for the notes and durations, as well as a single multivariate 
model, were trained. The parameters matched those of the LSTM model afore- 
mentioned. The longest prime sequence in the training set served as the input 
sequence length, and the longest target sequence in the training set served as the 
output sequence length. The graph of the multivariate encoder-decoder model 
is shown in Fig. 2. 


4.4 Inference Modeling 


Prediction was performed using the common beam search algorithm with beam 
sizes ranging from k= 2 to k=4. Only the notes from the prime part were used 
for the constrained inference [5]. The following note with the highest score, for 
instance, would be the new option for prediction of song’s prime sequence. 


4.5 Other Techniques 


Transposition was a method used to increase the uniformity of the input data. 
Transposition in music refers to the process of raising or lowering notes and 
chords by a fixed distance. The spaces between succeeding notes remain constant. 
Six major scales and six minor scales make up the 12 different scales in Western 
music. One of these scales is used in the majority of harmonic music. For instance, 
changing a song from the key of C major to the key of E major would require 
adding four semitones or four intervals to each pitch. Depending on whether the 
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Fig. 2. LSTM Encode-Decoder Multivariate Model 


music was originally in a major or minor key, each input song was transposed to 
either C major or C minor. The training then took place using the transposed 
pitches. The prediction notes were created by transposing the prime sequence 
into either C major or C minor before testing. The anticipated target notes were 
then recomposed back into their original key before being contrasted with the 
actual target notes. 


5 Results and Observations 


Both the training timeframes and the outcomes were adversely affected by the 
exponential growth of the input data for the polyphonic music. Scoring was done 
based on cardinality score and pitch score. 


Cardinality Score: Minimizing a general supermodular monotone non- 
increasing function f() with respect to a cardinality constraint. Let zj Es 
f (0) — f(x) and use the greedy algorithm on the set function 


9S) := f(aq) — fw, US), 


which is a monotone non-decreasing submodular set function with g(0) = 0. 
Thus, the greedy algorithm maximizes the function with respect to a cardinality 
constraint within an approximation factor of 1 — 1/e. However, there are two 
caveats. First, the greedy algorithm uses a budget of k — 1 instead of k (budget 
of one is spent on identifying xj). Second, and most importantly, observe that 
the approximation factor is obtained on the function g(S) and not f(S), they 
get a set S of size k — 1 such that 


He- FUS) 21- zę FED- US), 
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where S* is the optimal set of size k — 1 to be added to x} with the goal of 
minimizing f(). 


The Pitch Score. The right “shape” or separation from each note is rewarded 
by the cardinality of continuations. However, it rewards a continuation that is 
transposed one semitone lower in most cases. The pitches are additionally scored 
because this isn’t entirely accurate. Two normalized histograms of the pitches 
from the prediction and the ground truth are made to achieve this. The amount 
by which the histograms are overlapped determines the score. The operation is 
carried out while ignoring octaves as well. The cardinality score and the pitch 
score are combined linearly to produce the final score. 


Baseline. The first baseline estimate included selecting notes from the training 
set at random. The second technique involved selecting notes from the training 
set based on their frequency. The last approach used the same steps as the first, 
but added restrictions required that the sampled note be in the song’s expected 
prime sequence. The third method generated the highest baseline for both the 
polyphonic and monophonic music sets. 


Discussions. Polyphonic and monophonic datasets were evaulated individu- 
ally. 90% of the songs were used for training and 10% for testing in the first 
division of the datasets into training and testing groups. Figure 3 and Fig. 
4 displays the outcomes of numerous training and testing runs for the mono- 
phonic and polyphonic dataset. The LSTM model produced the greatest results 
when it used constrained inference, transposition on the input data, and a longer 
sequence length of 64. Compared to the best baseline score, the model’s accuracy 
increased by over two times. The accuracy of the LSTM model was improved 
by transposing the input and lengthening the sequence. Transposition improved 
all of the models’ accuracy, suggesting that more reliable input data improves 
learning. There were no appreciable improvements in accuracy brought about 
by the beam search method of inference. Prediction accuracy was also improved 
by constrained inference, particularly for the Markov model. 

The notes and duration feature representations for the input data were far 
more accurate than the interval feature representation. This suggests that com- 
pared to pitches, intervals may have less structure and greater unpredictability. 
Unsurprisingly, the results were significantly worse for the input with larger 
dimensions (i.e., the tuple combinations of pitches and durations). There were 
some minor differences between employing a multivariate LSTM model and 
training two distinct models for notes and durations. When all other variables 
remained constant, the multivariate model consistently beat the two indepen- 
dent models by a little margin. This suggests that there is some relationship 
between the anticipated notes and durations. In addition to having a slightly 
higher accuracy, the multivariate models also required less time to train than 
two individual models. 
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Fig. 3. Results - Model Predictions for Polyphonic Music Data 


Across all models, the pitch score was far more variable than the cardinality 
score. When the findings were examined more closely, it became clear that many 
of the projected sequences only had two or three different pitches. As a result, the 
model could determine which pitch or pitches appeared in the target sequence 
the most frequently, but not which order they were played. Additionally, no 
predictions were ever made for notes that in the target sequence seemed to 
be more “random” (i.e., not present in the prime sequence). While music might 
have a strong structural element, it also has randomness or surprises that provide 
appeal. 

Overall, the LSTM model performed better than every other model. The 
Markov models had the highest variance of all the models, indicating that they 
would benefit from further restrictions or domain expertise [6-8]. 

The baseline values for the polyphonic music were significantly higher than 
those for the monophonic music, neither set of results should be used to judge 
the other. Due to the method used to determine the score, the baseline scores 
for the two forms of music are different. 
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Fig. 4. Results - Model Predictions on Monophonic Datasets 
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6 Conclusion and Future Work 


Because music has both structure and randomness, it presents an intriguing 
and difficult prediction problem. Although feature representation is significant, 
feature size is even more crucial. Training is substantially more challenging for 
inputs with very high dimensions. Constraints can significantly shrink the search 
space and improve model accuracy when used in inference. Meaningful learning 
and inference depend on limiting the input space and the search space in ways 
that nonetheless capture pertinent relationships and significant features. 
Random pop tracks were employed in this research. Pop music is just one of 
many different musical styles. Jazz music is more free-form than classical music, 
which has a lot more structure. A fascinating expansion of this research would be 
to compare the results of other musical genres. Models, such as Hidden Markov 
Models and LSTM with attention mechanisms, leave room to wonder [9,10]. 
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Abstract. The industrialisation of the footwear recycling processes is 
a major issue in the European Union—particularly in view of the fact 
that at least 90% of shoes consumed in western economies are ultimately 
sent to landfill. This requires new Al-empowered technologies that enable 
detection, classification, pairing, and quality assessment in a viable auto- 
matic process. This article discusses automatic shoe pairing, which com- 
prises two sequential stages: a) deep multiview shoe embedding (compact 
representation of multiview data); and b) clustering of shoes’ embeddings 
with a fixed similarity threshold to return sets of possible pairs. Each 
shoe in our pipeline is represented by multiple images that are collected 
in industrial darkrooms. We present various approaches to shoe pairing— 
from fully unsupervised ones based on image descriptors to supervised 
ones that rely on deep neural networks—to identify the most effective 
one for this highly specific industrial task. The article also explains how 
the selected method can be improved by hyperparameter tuning, massive 
increases in training data, and data augmentation. 


Keywords: clustering * deep learning * multiview embedding - 
representation learning 


1 Introduction 


The increased availability of affordable mass-produced goods, coupled with 
rapidly changing consumer fashion trends, has resulted in a sharp increase in 
the consumption of products in many industrial sectors. The production of tex- 
tiles for clothing and footwear is expected to increase by 2.7% each year until 
20301. In [8], the authors claim that approximately 5% of the twenty billion pairs 
of shoes produced worldwide every year are recycled or reused. In the European 
Union, it is estimated that the waste that results from postconsumer shoes could 
reach several million tonnes per year. Politicians and members of civil society 


1 https: //www.businessoffashion.com/reports/news-analysis/the-state-of-fashion- 
2021-industry-report-bof-mckinsey/. 
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are starting to implement the Zero Waste to Landfill policy to address one of 
the major challenges of the twenty-first century for the footwear sector. This 
ambitious goal requires extensive support from intelligent information systems. 

The problem considered in this article results from practical need. VIVE 
Textile Recycling is the leader of the textile recycling industry in Poland and 
Europe. The firm is regularly required to process massive deliveries of shoes. 
A conveyor belt transports the shoes to a darkroom, where their profile photos 
are taken (of the vamp, sole, and heel). Next, a management system, ShoeSelec- 
tor consumes information from many external intelligent systems, and decides, 
for example, where the shoes should be thrown off of the conveyor belt, and 
whether they should be paired with their counterparts or remain single shoes. 
Shoe pairing is crucial for the shoes to be resold. 

The primary goal of this article is to present a novel shoe-pairing method that 
can be used effectively in the process of analysing large quantities of shoes that 
are transported on a conveyor belt. We have verified various well-known classical 
approaches, including image descriptors like Central/Hu moments, HOG, and 
SSIM-based approaches, as well as a novel deep-learning-based method that 
outperforms others. 

In our deep learning approach to shoe pairing, we introduced two sequential 
stages: a) multi-image shoe embedding that uses a deep neural network; and b) 
clustering of the shoes’ embeddings with a fixed similarity threshold to return 
pairs (clusters in which size = 2) and singles (unpaired shoes). Each shoe in our 
pipeline is represented by three images (vamp, sole, and heel) that are collected in 
the darkrooms. Sample images are presented in Fig. 1. Our approach is evaluated 
on diverse custom industrial datasets, as provided by Vive Textile Recycling. 


Fig. 1. Sample shoe images. The top row presents all three images (sole, vamp, and 
heel) for three distinct shoes. The bottom row presents the corresponding shoes found 
by the system. 
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2 Related Work 


Shoe pairing can be modelled as clustering of shoes’ compact representations 
within a cluster (maximum size = 2) and a minimum similarity threshold to 
discriminate correct pairs (two-element clusters) against single shoes without 
counterparts. 

Clustering is a fundamental task in the machine learning paradigm whose 
objective is to divide subjects into a number of groups in such a way that subjects 
in the same groups are more similar to each other and less similar to those in 
other groups. Traditional methods cluster subjects on the basis of a single set of 
features per subject [11]. 

Multiview clustering (MVC) is a variant of clustering in which each subject is 
represented by multiple sets of features [2]. MVC has been applied successfully 
to various applications, including computer vision, natural language process- 
ing, healthcare, and social media. Given that our project involves only images 
assigned to shoes, our focus lies in computer vision. 

MVC has been used widely in image clustering [7] and motion segmentation 
[5] tasks. Chi et al. [3] conducted MVC for web image retrieval ranking. Xin et 
al. [14] successfully applied MVC for person reidentification. Typically, several 
feature types, such as HOG [4], LBP [10], and SIFT [9], can be extracted from 
images prior to cluster analysis. 

Since 2012, deep learning has proved outstandingly efficient in a variety of 
applications, such as image classification, speech recognition, object detection, 
and natural language processing. Multiview deep clustering methods demon- 
strated better performance than traditional multiview shallow clustering meth- 
ods. Most of the literature presents multiview deep clustering as the process of 
clustering on the representations obtained from multiview representation models 
built in a supervised manner [2]. 

Multiview representation learning (MVRL) has recently become popular by 
its exploitation of complementary information of multiple features or modalities. 
Recently, due to the remarkable performance of deep models, deep MVRL has 
been adopted in many domains, including computer vision and signal processing. 
One article [15] presents a comprehensive review of deep MVRL in two perspec- 
tives: (1) deep MVRL extensions of traditional methods; (2) MVRL methods 
in full deep learning scope. The first group of methods introduce the advance- 
ments of deep learning models into traditional MVRL methods, such as multi- 
view canonical correlation analysis and matrix factorisation. The second group 
represents pure deep learning MVRL methods, such as multiview autoencoders, 
conventional neural networks, and deep belief networks [15]. 

The deep neural networks used to learn deep representations adopted in mul- 
tiview clustering methods have superior expression ability to reflect multiview 
data comprehensively. Multiview deep representation learning is a method of 
transforming a collection of inputs (in our case, profile images) into compact, 
fixed-size, floating-point representations that exhibit the desired properties. Such 
embedding can be subject to further processing, such as clustering. However, the 
authors of [2] claim that the separate processes of representation learning and 
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clustering entails limitations, such as representation learning being unaware of 
clustering’s goal. Our approach resolves this problem by using learning repre- 
sentations that are tailored to the purpose of our clustering, which is pairing. 

The outstanding efficiency of multiview deep-based clustering and its abil- 
ity to make representation learning aware of the clustering’s goal inspired us 
to develop a novel shoe-pairing process based on the clustering of representa- 
tions obtained from a trained deep multiview representation model (hereinafter 
referred to as deep multiview embedding-based clustering, DMVEC). 


3 The Proposed Approach 


The solution is a web service that is capable of responding to a variety of shoe- 
related requests via REST API. This article describes only the pairing aspect of 
the sytem. Answering pairing-related questions requires the system to perform a 
series of processing steps—some of which are shared with other tasks. The first 
(1) is input preprocessing (e.g. image decoding from base64 encoding, image scal- 
ing, and normalisation). Later (2), shoe detection is performed (empty hangers 
occasionally get photographed). Next, (3) deep multiview embedding of a shoe is 
performed (this step assumes that the embedding model is already trained and 
available for inference). Last (4), clustering of the shoes’ embeddings into pairs 
can be requested. In the two sections below, we describe (3) and (4) in greater 
detail. 


3.1 Deep Multiview Representation Learning 


The deep multiview embedding approach is a supervised method of changing 
the representation of a shoe into a single fixed-size vector based on its three 
images (shoe embedding). The process is based on a deep neural network, which 
is responsible for transforming the three images of a shoe into a compact vector 
of floating numbers. This transformation is trained in such a way that each shoe 
in a pair should be located close to the other, but away from unrelated shoes. 
The Euclidean distances between the embeddings can be used by the clustering 
method as a distance metric. 

This method is a multibranch combination of convolutional layers and dense 
layers. The input comprises three photographs of a shoe (vamp, sole, and heel). 
The output is a single 64-dimensional unit floating vector. In the initial part 
of the proposed deep neural network, we used the convolutional base of the 
VGG-16 network [12], pretrained on the ImageNet dataset”. The results of the 
convolutional base are globally pooled by channels. The use of the convolutional 
base of the VGG-16 network in our method is an example of transfer learning. 
The nonconvolutional part of the network is trained from scratch by freezing the 
VGG-16 weights. The vector obtained as a result of global average pooling of the 
output of the convolutional base is processed using a fully connected layer. Such 


2 https: //image-net.org. 
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Fig. 2. The deep multiview shoe embedding model. vgg_preproc — a preprocessing layer 
in which images are converted from RGB to BGR, before each colour channel is zero- 
centered relative to the ImageNet dataset; vgg — the convolutional base of the original 
VGG-16 network pretrained on ImageNet; gap — global average pooling, the layer used 
for dimensionality reduction; dense - a dense or otherwise fully-connected layer; relu — 
an activation layer in which all negative values are replaced with zero and the remaining 
values are passed to the output; sum — a layer for aggregating the output by adding 
tensors from multiple layers; norm — a layer that normalises the input vector to a unit 
vector. 


a network fragment, called a branch, is repeated between one and three times, 
depending on the number of image types desired. Then, all branches are aggre- 
gated over the corresponding indices (element-wise) and transformed once more 
using the fully connected layer. Last, the 64-dimensional output is normalised so 
that it lies on a multidimensional sphere with a unit radius. Figure 2 presents the 
entire architecture of the deep neural network. To train our embedding model, 
we used triplet learning (namely, the TripletSemiHardLoss and TripletHardLoss 
loss functions), in which each reference (anchor) shoe requires a positive exam- 
ple (a corresponding shoe from a pair) and a negative example (a nonsimilar 
shoe). The deep neural network is trained in a regime that forces similarity (e.g. 
Euclidean-distance-based similarity) between the representations of the reference 
and the compatible shoe, while reducing the similarity of the representation of 
the reference example and the negative example. 


3.2 Clustering 


Given the trained deep multiview representation model (the shoe embedding 
model), a visually represented batch of shoes can be transformed into a set of 
embeddings. The most similar embeddings can then be identified to pair the 
shoes. The desired pairing should connect the elements so that the distance— 
and, therefore, the differences in appearance—is as short as possible (or alter- 
natively, similarity is as high as possible). 

Although we evaluated many methods, considering the heavy deep multiview 
embedding model and specific nonfunctional requirements (such as the number 
of shoes in the clustered population being limited by the capacity of the conveyor 
belt, which is around 1,000-2,000 hangers), the most robust and comprehensive 
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was the greedy method, which is based on agglomeration clustering with an 
additional termination condition. It works as follows: 


1. Create a collection of unpaired shoes L; 

2. Create an empty collection of result pairs P; 

3. Find two shoes la and ly in Z that are separated by the shortest Euclidean 
distance between their embeddings, which does not exceed the threshold t; 

4. Remove them from Z and add a pair (la, lẹ) to P; 

If at least two items remain in L, go to step 3; 

6. Return pairing P plus the remaining unpaired shoes as singles. 


or 


4 Results 


In the initial phase of our experiments, we had limited access to labelled data. 
Approximately 1,000 hand-labelled pairs of shoes were split into training (80%) 
and test (20%) sets. Using these relatively small datasets, we evaluated various 
methods to identify the most promising one. At this point, we suspected that 
we using too little data to fully unlock the benefits of deep learning. We opted 
to use transfer learning, which is helpful in cases of scarce data. 

We verified four unsupervised classical methods: Central and Hu moments, 
HOG, and SSIM. We used a special evaluation measure called pairing accuracy, 
which is the percentage of shoes that are correctly paired (or unpaired). We 
define pairing accuracy as the number of shoes with correct assignment (properly 
paired or properly unpaired) divided by the number of shoes considered. 


Table 1. The quality evaluation of the shoe-pairing methods on the test set mea- 
sured by pairing accuracy; VAMP/SOLE/BOTH indicates from which images the 
shoe-pairing model was inferring. 


Method VAMP | SOLE BOTH 
DMVEC 0.73 0.95 | 0.99 
Central moments | 0.02 0.01 | 0.01 
Hu moments 0.01 0.01 (0.02 
HoG 0.03 0.05 | 0.08 
SSIM 0.05 0.09 | 0.10 


Table 1 presents the results of the shoe-pairing experiments on our initial 
test set. The first row presents the proposed DMVEC approach (deep multiview 
embedding-based clustering). The subsequent rows present the results obtained 
by our unsupervised classical baselines: 


1. Central moments — the shoe images, separately or joined horizontally in one 
image, are described by image descriptors called central moments [6] before 
clustering is applied; 
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Fig.3. Adjustment of the similarity threshold is a crucial task in shoe pairing. The 
image on the right shows the similarity between shoes in proper and incorrect pairs in a 
set of 34,000 pairs returned by a prototype system and next labelled by human taggers. 
The image on the left demonstrates how we tuned the hyper-param-similarity threshold 
using the 1,000-shoe validation set. We counted partial errors and accuracy (acc) for 
each threshold setting. The legend presents the errors as a ratio of TP/FP/TN/FN to 
all shoes in the paired set. T'hr is represented by the vertical line that corresponds to 
the threshold value that reports the highest accuracy on the 1,000-shoe validation set. 


2. Hu moments — the shoe images, separately or joined horizontally in one image, 
are described by image descriptors called Hu moments [6] before clustering is 
applied; 

3. HoG — the shoe images, separately or joined horizontally in one image, 
are described by image descriptors called histogram of oriented gradients - 
(HoG) [13] before clustering is applied; 

4. SSIM — the shoe images, separately or joined horizontally in one image, are 
compared one vs one using a structural similarity index measure (SSIM) [1]. 
SSIM distances are used for clustering. 


The initial experiment revealed that DMVEC obtained the best results; how- 
ever, the test set consisted solely of pairs i.e. all shoes in the test set had their 
counterparts within the test set. This situation is a borderline one; usually, the 
ratio of paired shoes in a real-world population of shoes under pairing is between 
50% and 90%. The appearance of singles in a population demands the introduc- 
tion of a threshold, which prevents further pairing and returns the remaining 
shoes as singles. We received a 1,000-shoe validation set from Vive Textile Recy- 
cling that represented the most common distribution of pairs in a population. 
In Fig. 3, we present how we tuned the similarity threshold value based on the 
validation set. 

The next phases of the project involved conducting experiments on diverse 
test populations that contained significant proportions of singles. The results 
of the experiments revealed that accuracy reported a huge drop from an initial 
99% to around 80-90% (lower scores when more singles were in the population; 
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Fig.4. Training and validation (on the test set) metrics concerning our best pairing 
model. The model was trained using the maximum number of true pairs collected 
over two years of labelling efforts. We accumulated 108,000 shoes in pairs, and vali- 
dated/tested after each epoch using 2,000 shoes, of which 1,000 were paired and 1,000 
were singles (the x-axis represents epochs; the y-axis represents accuracy - left axis and 
loss score - right axis). 


80% was reported when we had equal numbers of paired shoes and singles). 
After investigating the drop in pairing accuracy, we reached two conclusions. 
(1) Finding pairs in a population that does not contain singles is a significantly 
easier task than pairing a general population. The threshold is challenging to 
discover; embeddings are typically not separable by any single threshold and 
this introduces a tradeoff between precision and recall. (2) The difficulty of shoe 
pairing depends inherently on the size of the population. This differs from other 
tasks, like classification, in which scores are independent of validation set size. 
This happens because in pairing, examples in the dataset are not independent 
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of each other. This can be also seen from the perspective of a solution space, in 
which a combinatorial explosion appears in the number of possible pairings. 

We decided that our training dataset was insufficiently representative for deep 
multiview representation learning, and that a more robust embedding model was 
demanded. We manually labelled tens of thousands of pairs returned by a proto- 
type system in a preindustrial environment. After more than a year of labelling, 
we had collected approximately 54,000 proper pairs. In Fig. 4, we present the 
training and validation process using our largest training dataset and data aug- 
mentation (cropping, saturation, and hue disturbance). The trained model is 
highly robust and reports high accuracy scores during clustering on the test 
set—even when it was tested on a very difficult set (1,000 shoes in pairs and 
1,000 singles), it presents accuracy above 97% after the 10th epoch. 


5 Conclusions 


This article presents a novel approach to shoe pairing, a dual-stage approach 
that comprises deep multiview shoe embedding and clustering. We evaluated 
different approaches to shoe pairing, from classical unsupervised ones based 
on image descriptors to the proposed supervised one that applies deep neu- 
ral networks. The best-performing model in this task is the proposed supervised 
method, DMVEC. It reports almost 100% accuracy on the test sets that exclu- 
sively comprise shoes in pairs, and at least 97% when singles cover almost half of 
the test population. Over a broad number of tests, we evaluated different test sets 
(with different size and various distributions of singles). We also demonstrated 
how our selected method can be improved by hyperparameter tuning (similarity 
threshold tuning), massive increases in training data, or data augmentation. 

In the future, we plan to conduct research on further optimisations for 
DMVEC. We suspect that multiple factors can impact the pairing accuracy of 
our model, such as the margin hyperparameter in triplet learning, the schedule 
of TripletHardLoss and TripletSemiHardLoss during training, or the GAP layer 
discarding too much information. There are also other necessary tasks apart 
from pairing, which will be addressed soon. 
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Abstract. Reinforcement learning (RL) is one of three basic machine 
learning paradigms, alongside supervised learning and unsupervised 
learning. Reinforcement learning algorithms have become very popular 
in simple computer games and games like chess and GO. However, play- 
ing classical arcade fighting games would be challenging because of the 
complexity of the command system (the character makes moves accord- 
ing to the sequence of input) and combo system. In this paper, a cre- 
ation of a game environment of The King of Fighters '97 (KOF ’97), 
which implements the open gym env interface, is described. Based on 
the characteristics of the game, an innovative approach to represent the 
observations from the last few steps has been proposed, which guarantees 
the preservation of Markov’s property. The observations are coded using 
the “one-hot encoding” technique to form a binary vector, while the 
sequence of stacked vectors from successive steps creates a binary image. 
This image encodes the character’s input and behavioural pattern, which 
are then retrieved and recognized by the CNN network. A network struc- 
ture based on the Advantage Actor-Critic network was proposed. In the 
experimental verification, the RL agent performing basic combos and 
complex moves (including the so-called “desperation moves”) was able 
to defeat characters using the highest level of AI built into the game. 


Keywords: Reinforcement learning - KOF ’97 - Arcade fighting 
game * Python - pattern recognition - Artificial Intelligence - Neural 
Network - CNN: graphical representation - agent’s reward system - 
transfer learning 


1 Introduction 


Since DeepMind’s AlphaGo defeated the world champion in the ancient GO 
game in 2015, with the rise of deep learning, more and more researchers have 
been dedicated to studying reinforcement learning (RL). One of the favorite RL 
© The Author(s) 2023 
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study field is playing video games. Since “Street Fighter 2” was published on 
the arcade platform in 1991, fighting games have taken the world by storm, 
and the fighting genre has become an essential category in video games. Many 
researchers have studied applying RL techniques to fighting games such as the 
2D game “Super Smash Bros” [1], 2.5D game “Little Fighter 2” [3], and 3D game 
“Blade & Soul” [5]. In [1], researchers used information such as player's location, 
velocity, and action state read from game memory as observations instead of raw 
pixels. They used two main classes of model-free RL algorithms: Q-learning and 
policy gradients, and both approaches were successful even though the two algo- 
rithms found qualitatively different policies from each other. After training the 
agent to fight against in-game AI, a self-play curriculum was utilized to boost the 
agent’s performance, making the agent competitive against human professionals. 
In [3], since in 2.5D games, 3D objects in a scene are orthographically projected 
to the 2D screen, the main challenge is the ambiguity between distance and 
height. To tackle this problem, in addition to an array of 4-stacked, gray-scaled 
successive screenshots, the location information of both the agent and the oppo- 
nent read from the environment was used as an observation. A CNN was used to 
extract visual features from the screenshots, and other information features were 
concatenated with the CNN’s output. They also utilized LSTM in the network in 
order to enable the agent to learn proper game related and action-order related 
features. In [5], a novel self-play curriculum was utilized. By reward shaping, 
three different styles (Aggressive, Balanced, and Defensive) of agents were cre- 
ated and trained by competing against each other for robust performance. By 
introducing a range of different battle styles, diversity in the agent’s strategies 
were enforced, and the agents were capable of handling a variety of opponents. 

Compared with classical fighting games on the arched platform, such as the 
“Street Fighter” series and “The Kind of Fighters” series, the command system 
of games mentioned above is relatively simple: special moves are triggered either 
by combinations of direction keys and action keys or by dedicated keys. While in 
the game of KOF ’97, the movement system is more complex. Except for basic 
moves such as a punch or a kick, special and desperation moves’ commands 
consist of a sequence of joystick movements and button presses. A desperation 
moves’ command is longer than special moves’ and using a desperation move 
consumes a power stock indicated by a green flashing orb at the bottom of the 
screen. The movement and combo systems require from the RL agent to master 
the ability to complete relatively long sequences of accurate inputs. 

To the best of our knowledge, till now, there is no RL research dedicated to 
arcade fighting games like KOF ’97, which typically feature special moves and 
desperation moves that are triggered using rapid sequences of carefully timed 
button presses and joystick movements. The aim of the study described in this 
paper was to develop a RL agent which can not only win the game KOF ’97 but 
also masters a complex game movement system. In order to achieve this goal, 
special attention was paid to the effective representation of observations in the 
RL environment. 
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2 Environment Setup for KOF ’97 


2.1 Interaction with Arched Emulator 


An arched game emulator called MAME is used to set up the python game 
environment. It is possible to drive MAME via Lua scripts externally, and the 
event hook allows interaction with the game at every step. An open source 
Python library called MAMEToolkit [4] encapsulates Lua scripts operation into 
Python classes. The KOF ’97 game environment, which implements the Gym 
env interface, is built based on this python lib. Figure 1 shows a simplified com- 
munication process between the game environment and MAME emulator. Whole 
source code, including game environment, training, and evaluating, is available 
on GitHub! 


MAME emulator KOF '97 Env 


KOF '97 Game Lua Console MAMETookit 


lua scripts 


Controller scripts output 
———> 


actions 


observations 


Fig. 1. Simplified communication process 


2.2 Action Space 


There are eight basic input signals: UP, DOWN, LEFT, RIGHT, A, B, C, and 
D. The action space has 49 discrete actions, including all meaningful combi- 
nations of the basic input signals. One of the problem is that direction inputs 
in the commands are based on the character's orientation, i.e., if the character 
changes its orientation, all direction inputs in the commands should be flipped 
horizontally. Since the agent operates 1P’s (Player 1) character, most of the 
time, the character is facing to the right side. It makes the training samples 
unbalanced. Moreover, the experience learned from the default orientation could 
not be applied to the other side. A trick of action flipping is used to tackle 
these problems. In the action space, FORWARD and BACKWARD commands 
are used instead of LEFT and RIGHT, and the environment will transfer FOR- 
WARD and BACKWARD to the LEFT or RIGHT according to the character’s 
orientation. With the help of this trick, the agent’s win rate increased to over 
90% from 70% after 30M timesteps’ training. 


I Source code address: https://github.com/duhuaiyu/AIbotForKof97. 
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2.3 Observation 


The observation vector from a single step consists of several parts. The first part 
contains basic information (such as characters’ HP, location, POW state, and 
combo count). All scalar features are normalized. The second part is the charac- 
ters’ ACT code, which represents the character’s action. ACT is a concept in the 
game implementation mechanism. The character’s ACT code and corresponding 
action can be checked in debug mode of the game. A character’s move consists 
of one or more ACTs, and each ACT consists of several animation frames. As 
Figs. 2 shows, the character’s “100 Shiki Oni Yaki” Move consists of 3 ACTs 
(Codes are 129, 131, and 133, respectively) which denote the start, middle, and 
end of the move. Each character can have a maximum of 512 ACTs. Although 
some of the ACT codes are not used, for compatibility, a 512 bits one-hot encod- 
ing vector is used to represent the ACT code feature. The basic information and 
characters’ ACT code can be read from the game memory. The last part is the 
agent’s input, which is also a one-hot encoding vector. 


ACT 129 ACT 131 ACT 133 


Fig. 2. “100 Shiki Oni Yaki” Move's ACTs 


2.4 Graphical Representation of the Input and ACT Sequences 


For readers can have a better understanding of the approach proposed below, 
it is worth to mention about the command judgment system of the game. As 
mentioned, to make a special move or desperation move, the player needs to 
finish a sequence of input in a short period. The game system would determine 
which move the character is going to do according to the input history. As human 
input varies in rhythm and precision, the judgment of input sequence has some 
flexibility. Take the move “hopping back” for example. The command is “— 
—”. As Fig. 3 shows, as long as the player can finish such an input pattern in 
8 frames, the character can make the move successfully. In case 1 and case 2, 
although activated keys last for a different time, the input pattern = =” is 
finished in 8 frames, so the character can make the hopping back move. In case 
3, such an input pattern is not finished in the time window, so the character 
failed to make the move. 

The observation features from a single step are not enough to capture all 
important information since, on the one hand, the characters move highly 
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current frame 


1 
frame index -8| -7| -6| -5| -4| -3] -2| -1| 0 
case 1 — accept 
case 2 accept 
case 3 - reject 


8 frames window 


Fig. 3. Hop back command's input judgment 


depends on the recent input history; on the other hand, according to the sequence 
of actions the characters have already made and for how long these actions have 
lasted, the player can make more accurate predictions and make an optimal 
response. In order to preserve the Markov’s property, it is necessary to maintain 
the character’s input sequence and ACT sequence within a specific time window. 
But simply concatenating features from several steps is not a good idea. Firstly, 
the feature vector for each step is long (over 1000 per step), and at least 30 
frames’ feature is needed since desperation move’s commands are pretty long. It 
means more neurons for the network are needed, and training such a network 
would be slow and difficult. Besides, because of the command judgment system’s 
flexibility, the input sequence can vary for the same move, and it is hard for an 
MPL network to find such a pattern. Our idea is to stack the 1P’s input sequence, 
1P’s ACT sequence, and 2P’s ACT sequence as three binary images. Figure 4 
shows our approach to the representation of the feature sequences in the form of 
binary images, and these images encode the character’s input and behavioural 
pattern, which are then retrieved and recognized by the CNN network. Although 
input sequences can differ for the same move, the input sequences will form a 
similar pattern on the graphic representation, and CNN is good at image pattern 
recognition. 


2.5 Reward System 


Based on the goal, the following reward system inspired by the reward system 
described in [1] is proposed: 


Rectal = RiP-damage _ p2P-damage _ distance = ptime _ pPow 


where 
RiP-damage = ad? x clP x y1? 
p2P-damage = d2? % 2P k AP 
pdistance = Mazx(0, 9 = B) $ PARSE 
pime = |x ame 
pPOW — 5 if a POW orb consumed 
~ |0_ otherwise 
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Except for basic features, it is 
necessary to maintain other 

features for last N steps as an 
observation to preserve 
Markov's property. 


Sequences of stacked vectors from 
successive steps create binary 
observation feature vector for each step images. These images encode the 
character's input and behavioral 
pattern, which are then retrieved 
and recognized by the CNN network. 


stacking last N successive steps’ features 


Fig. 4. Graphical representation of the input sequence and ACT sequences 


The first two terms are a damage reward and a damage penalty 
(RiP-damage | p2P-damage) The priority is to win the game, so it is quite natural 
that if the character deals damage (d'”), the agent gets a reward. In contrast, if 
the opponent deals damage (d?”), the agent gets a penalty. In order to encour- 
age the agent to perform more combo, the damage reward and penalty will be 
amplified linearly with the combo numbers(c!”, c?”). On top of that, two fac- 
tors: y1? , and 72”, are introduced to balance the reward and penalty caused by 
damage. To prevent the agent from behaving too aggressively, c?” is set to 1.3, 
and cl is set to 1. 

Except for the damage reward and penalty, some other terms are also intro- 
duced to adjust the agent’s behavior. The distance penalty term P#stance jg 
used to make the agent keep a reasonable distance between the two characters. 
If the distance exceeds a certain threshold , the agent will get a small penalty 
controlled by the factor y“**¢"°*, The term PFT gives the agent a small penalty 
at every step. It is used to encourage the agent to deal damage and win the game 
as soon as possible. The last term in the formula is P”?W. It gives a moderate 
penalty (specifically 5) when the POW orb is consumed, and if the desperation 
move can hit the opponent later, the reward will counter-weigh this penalty. 
Otherwise, the agent will be punished for using the POW orb for nothing. 


2.6 Proposed Network Structure 


Figure 5 shows the overall structure of our proposed network. The network’s 
input consists of four parts: basic features as a vector, stacked input sequence 
(last 40 frames), stacked 1P characters’ ACTs (last 36 steps), and stacked 2P 
characters’ ACTs (last 36 steps). There are three CNNs to extract features from 
these images, and the results of these CNN extractors are concatenated with the 
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basic features as the input of our Action Network and Value Network. Figure 5b 
shows the structure of CNN network. The proposed network is referred to as 
Multi-CNN in the following content. 


Input Feature extractors Contact Policy Network 
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Fig. 5. Proposed network structure 


3 Experiment 


3.1 Training Process 


The game’s difficulty level (Built-in AI level) was set to 8, which was the hard- 
est setting. Built-in AI refers to a programmed action strategy that is used to 
fight against players in this game. Even for intermediate-level gamers, winning 
a level 8 built-in AI is quite challenging. On top of that, to simplify the training 
process, the game model was changed from team-play to single-play. The chosen 
RL Algorithm was Proximal Policy Optimization [6]. The model’s training was 
carried out on a personal computer with a 12-cores CUP and an RTX3060 GPU 
with 12 environments running in parallel. 
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3.2 Results 


Multi-CNN Model and Transfer Learning. Each model was evaluated for 
50 episodes. The first model’s agent operated the character Iori to fight against 
Kyo Kusanagi. As Tablel shows, after 54h of training, the first Multi-CNN 
model achieved a 100% win rate. A random agent was made as a negative base- 
line, and the win rate of the random agent is only 18%. Based on the first 
Multi-CNN model, using transfer learning, three more models were trained to 
fight with another three arbitrary chosen characters: Goro Daiman, Mai Shi- 
ranui, and Billy. After 30M timesteps’ training, as Table 1 shows, the transfer 
learning models also achieved similar performance. As Fig.6 shows, the trans- 
form learning models’ performance increases more rapidly than the one learned 
from scratch, meaning that the experience learned from the previous model can 
be transferred to new models. The agents can defeat their counterparts in a short 
time and have lots of HP remaining. Besides, our agents are able to perform more 
combos and desperation moves than the random agent. Overall the Multi-CNN 
models are more than capable of winning the game by a considerable margin. 
Sample videos of the Multi-CNN model agents are available on Youtube?. 


Table 1. Test Result 


Model Transfer | Opponent Win Rate | Total Training | Mean Std 
Learning Timesteps | Time Reward | Reward 
Multi-CNN | No Kyo Kusanagi 100% 50M 54h 157.4 48.2 
Multi-CNN | Yes Goro Daimon | 100% 30M 31h 156.6 44.6 
Multi-CNN | Yes Mai Shiranui | 100% 30M 30h 111.0 76.0 
Multi-CNN | Yes Billy 100% 30M 33h 151.0 86.0 
LSTM No Kyo Kusanagi | 46% 50M 43h —27.6 105.5 
Random — Kyo Kusanagi 18% = - —140.2 | 83.4 


category 
— Multi-CNN VS Kyo 

— Multi-CNN VS Daimon 

— Multi-CNN VS Mai 

— Multi-CNN VS Billy 

— LSTMVS kyo (1 layer 768 units) 


Mean Reward 


Steps 1e7 


Fig. 6. Mean rewards of Multi-CNN models and LSTM model 


2 Sample videos: https://youtu.be/v22Le-c9Uak. 
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LSTM Network for Comparison. The LSTM network is well-suited to tackle 
tasks based on time series data since the cells can remember values over arbitrary 
time intervals. In order to illustrate our proposed network’s validity, an LSTM 
network was constructed for comparison. As Fig. 7 shows, the structure is similar 
to the Multi-CNN’s. The only difference is that the multiple CNN extractors are 
replaced with an LSTM module. As Table 1 shows, given the same budget (50M 
timesteps), the LSTM model can only achieve a 46% win rate. As Fig. 6 shows 
(the purple line), at the first 5M steps of training, the performance increases 
steadily, then it begins to fluctuate without apparent improvement. Whereas 
in the Multi-CNN model, the reward gradually increases as the step increases. 
Figure 8 shows that more LSTM models with different numbers of hidden layers 
and hidden units were trained for 10M timesteps, but none of these models was 
promising. Another observation is that the training process dramatically slowed 
as the number of hidden layers and units increased. 
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Fig. 7. LSTM network structure 
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Fig. 8. Mean reward over timesteps of more LSTM models 
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3.3 Discussion 


Even though our agent can perform some simple combos and defeat the built- 
in AI by a large margin, the agent failed to learn advanced combo skills. It is 
because performing advanced combos needs a long sequence of accurate inputs at 
the right time. Such opportunities are rare unless the player intentionally creates 
them. Furthermore, at each step, our agent has many actions to choose from, 
so purely by random exploration, it is almost impossible for our agents to get 
such an experience. Some studies, such as [2,7], combined supervised learning 
and reinforcement learning to make the training process more effective. In the 
first stage, they used supervised learning to make the model imitate human 
operation, and in the second stage, they used RL to enhance their model. It 
would be a promising direction for further improvement. 

Atari games are usually used as a testbed for RL algorithms, and the opera- 
tion skill of such games is relatively low. Whereas in reality, manipulating things 
is often more complex and involves a sequence of operations with some patterns. 
The Recurrent Neural Network (RNN) is commonly used to process time series 
data. However, in such models, more recent data usually impact the results more, 
so it is not quite efficient for detecting behavioral patterns with some flexibility. 
The proposed graphical representation of sequences and network structure pro- 
vides a new idea for detecting and simulating complex action patterns that are 
required by more practical tasks. 


4 Conclusions 


In this paper, a lot of valuable experience from the previous research is applied 
to our task: playing the arched fighting game KOF ’97. Based on the features of 
this game and our goal, a novel graphical representation of the input sequence 
and characters’ action sequence is proposed. The input pattern and charac- 
ters’ behavior pattern are well extracted by the proposed Multi-CNN network. 
Besides, a contribution to the RL community has been made by adding a new 
game environment of KOF ’97, which is one of the most iconic arcade fighting 
game. 

The experiments show that the agent can not only win the game by a large 
margin but also learns to perform some combos and desperation moves according 
to the situation. They also show that the experience learned from fighting against 
one character can be transferred to fighting against other characters via transfer 
learning. 
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Abstract. This paper presents the concepts of a universal, modular 
framework that shall enable rapid development of AI applications. The 
goal of the conceptual tool is the versatility in terms of changing the 
environment, as well as integration of different types of sensors to perform 
a specific task. The framework follows the Sens4U approach that will 
facilitate the entire process of building the AI applications and simplify 
the testing process. 


Keywords: AI: Artificial Intelligence - AI Application - Framework 


1 Motivation 


Artificial intelligence (AI) has grown in popularity in recent years. AI has been 
applied in many scenarios, from intelligent assistants to advanced real-time vision 
systems. AI enables devices to make decisions based on data collected from 
cameras or various sensors. This data is processed and the device responds cor- 
respondingly. Artificial intelligence (AI) is a part of computer science domain 
which builds systems or machines that in specific tasks can imitate human intel- 
ligence. They are also able to constantly improve their performance according 
to collected data. Artificial intelligence is often associated with humanoid robots 
that are believed to replace humans, but their real purpose is to increase human 
capabilities instead [1]. Artificial intelligence is divided by the mechanism of 
complexity. 

Artificial Weak/ Narrow Intelligence is the type we are using nowadays. 
Most of current AI systems can be classified as narrow artificial intelligence. All 
software using technologies such as natural language processing, Machine learn- 
ing, pattern recognition, data mining is considered as narrow artificial intelli- 
gence. Narrow artificial intelligence focuses mainly on tasks in which human 
beings can be outperformed. 

Artificial Strong/General Intelligence is the concept of an intelligent 
system with a great deal of knowledge and cognitive ability, which will be able to 
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independently perform tasks even those not yet known. It should think similarly 
to a person and with comparable or even better efficiency. At present, there is 
no machine that can understand the world as a human being can. 

Artificial Super Intelligence is the most popular term circulating in the 
computer science world in recent times. An artificial super intelligence will hypo- 
thetically be able to interpret and to understand human behaviour and/or intel- 
ligence. It will be able to surpass human intelligence in mathematical tasks, 
science, medicine or applying solutions to different types of problems. What is 
more, artificial super intelligence is assumed to be able to evoke understanding, 
emotions or to be capable of having its own desires. 

Artificial intelligence is an idea that integrates human intelligence with 
machines, as shown in Fig. 1. An issue related to artificial intelligence is ML 
(Machine Learning), which is one of the subcategories, that belongs to the area 
of artificial intelligence. Its main goal is to allow computers to learn using specific 
algorithms. 


Machine 
Learning 


Deep 
Learning 


Fig. 1. Artificial intelligence structure. 


This paper presents the concept of a universal, modular tool that will enable 
rapid construction of AI applications. The goal of this conceptual tool is its 
versatility in changing environment and different sensor types integration needed 
to perform a specific task. The concept of the proposed tool using the Sens4U 
approach [2], will improve the construction process, testability and reliability 
of the application. The idea of Sens4U is based on accelerating the application 
construction for non-experts in the field by using the modularity of available 
libraries. 

Currently, there are many tools available for training deep neural networks, 
and many examples of software codes that support the sensors needed to create 
an intelligent robot or implementation of trained models. However, putting these 
together and making it fully function requires a large time investment. 
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When working on the SpaceRegion project, the initial focus was to build a 
stationary robot with no mobility capability. The robot is equipped with video 
cameras and servos to control the cameras, and its task was to observe and 
track detected objects. In the course of the work, it was decided that a mobile 
robot would be additionally built for the project. Initially the mobile robot was 
supposed to have the same functionality as the stationary robot, but during the 
code implementation to the mobile robot it turned out that most of the software 
code was the same, and only the code responsible for controlling the robot chassis 
has changed significantly. 

This gave rise to the idea of building a universal tool that would speed 
up the process of developing AI applications. This approach can be also used 
to move one application from a particular scenario to another, for example a 
consumer solution to a space solution, without re-implementing the application 
from scratch. A universal tool must define application areas and requirements 
that simplify the migration process between scenarios. According to the Sens4U 
approach, the application area of the proposed tool will be expandable, by adding 
support for new sensors or updating those already implemented in the conceptual 
tool. 

This paper is structured as follows. Section 2 discusses the related work, while 
Sect. 3 describes the proposed approach, followed by the example scenarios given 
in Sect. 4. Section 5 concludes the paper. 


2 Related Work 


At present, one can find systems that provide users with libraries and tools to 
help building robot applications, such as Robot Operating System (ROS) [3] or 
CubeSystem [4]. ROS is a meta operating system focused on building robots, 
which main goal is to support users in recycling of previously written code for 
robotics research or development purposes. ROS has a Gazebo [5] robot simulator 
that allows applications to be developed virtually, prior to the real robot, saving 
time and cost of modifications that might arise during construction. Another 
advantage of the ROS system is language versatility, thanks to which software 
can be written in such programming languages as Python [6], C++ [7], Lisp [8]. 
The big disadvantage of this system is the lack of multi-platform The stable 
version work only on Unix [9] systems, but there are also experimental versions 
for Windows system. 

ROS is open source software which allows the community to create new 
solutions. CubeSytem on the other hand is a collection of hardware and pro- 
grammable components that enable rapid prototyping of robots. CubeSystem in 
combination with the RobLib library [10] allows customization and linking to a 
distinct set of libraries that are tailored to a specific application. For example, 
the RobLib library contains general functions for controlling robots based on a 
two-wheeled differential drive. It is generally difficult to find systems or sets of 
libraries that have standardized functionality for sensors of the same type. The 
operation complexity level of the application is raised by a number of additional 
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software, libraries and an adequate operation system that have to be installed. 
Fulling of such requirements can be problematic for a user unfamiliar with the 
field. Given the existing software and libraries, it seems reasonable to present 
the concept of a tool that is functional, modular, and cross-platform to support 
the construction of a complete application. 


3 Proposed Approach 


The concept of the tool for fast and modular construction of the application is 
shown in Fig. 2. The fact is almost every available sensor on the market has ready- 
made hardware drivers as well as dedicated software. The aim of the proposed 
tool is to create software, which will be a set of dedicated libraries, but any 
introduced modification in the conceptual tool will be applicable on different 
sensors. As a result, it will always return the same output, assuming that all the 
requirements specified during the construction of the conceptual tool are met. 
The idea for the proposed tool emerged during working on the development 
of a computer vision application. The programming language used to build the 
application is Python version 3. Python is a high-level programming language, 
and thanks to its simplicity and versatility, it has become an ideal tool for 


building AI systems. 
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Fig. 2. Concept of the proposed tool. 
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The base AI application that was created for the project had to observe the 
environment, recognise people in the same room and check from time to time 
whether the person is still in the same place. Based on the people present in the 
room, the robot estimated in which office it was located. T'he basic application 
was initially built on a stationary robot, which is the QBO ONE [11] produced 
by Thecorpora as shown in Fig. 3. 


Fig. 3. Robot QBO. 


Further, the plan was also to investigate the possibility of a smooth transfer of 
the application samples to more demanding environments (like space). In order 
to support that, the application should fulfil its tasks when the original sensors 
and actors are replaced by devices that fit the new environmental conditions. The 
application should thus use generic application programming interface (API) to 
apply the functionalities of the devices and for instance adapt to their physical 
features (video resolution of a camera module or the resolution of a motor turn). 

Replacing a device with another one does not change the application imple- 
mentation although it can change its internal behaviour and features. The API 
equips the application with the control over the device and provides information 
about its features. The application developer who uses the conceptual tool will 
be able to obtain a list of available functions for a given sensor. When building 
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the tool, it would be necessary for the user to define the communication interface 
and to enter the parameters so it could work correctly. 

For example, a user is building a robot with built-in DC motors. Using the 
API functionality of the conceptual tool, the user would be able to call, for exam- 
ple, the getFeaturesDCmotor() method. This method will return information to 
the user about parameters that should be entered and about available functions 
for DC motors e.g., turnForward(), turnBackward(), stop(). 

Another functionality of the tool concept is its modularity, which aims to 
support the seamless modification of the project. As an example lets take the 
modification of the mentioned robot. The user replaces the standard DC motors 
with stepper motors, which brings a number of changes to the created main 
application. However, when using the proposed conceptual tool, the only mod- 
ification that the user should provide, would be possibly a name change of the 
used motors. The basic functionality would be consistent for both motors and 
the program operation after the motors replacement would not be disturbed. 

What’s more, after modifications, calling the getFeaturesSmotor () method 
for stepper motors will display additional functions, besides turnForward (), 
turnBackward (), stop() stepper motors can also have functions like turnFor- 
wardAboutAngle (x_ angle), turnBackwardAboutAngle (x_ angle) and turnLeft- 
AboutAngle (x_angle), turnRight-AboutAngle (x_angle), setStartPosition() 
while maintaining the requirements related to the motor mounting direction 
specified during the tool construction. 

The proposed tool will allow expansion with new libraries depending on the 
users’ needs. The main advantage of the conceptual tool would be reliability to 
environmental changes, and unification of functionality for sensors of the same 
type, what would result in rapid modifications in the design. During the construc- 
tion of the tool, it will be possible to extend its operation to support different 
types of sensors. Moreover, sample functionalities such as object detection on the 
provided video stream or real-time object tracking from trained deep learning 
models can be implemented in the tool for artificial intelligence purposes. 

For devices with low processing power, the proposed conceptual tool will be 
able to implement functionality to support AI accelerators. The AI accelerators 
will enable the satisfactory performance of artificial neural networks combined 
with computer vision. AI accelerator is, for example, the Intel Neural Compute 
Stick 2 [12]. One advantage of the proposed tool will be its cross-platform capa- 
bility, which would allow the tool to run on both Microsoft Windows and Linux 
platforms. 


4 The Application Area 


This section presents examples of scenarios from which requirements for the 
correct operation of the proposed tool may arise. Some of the scenarios have been 
implemented, some are in the process of being implemented and some may be 
created in the future for development, experimental or research purposes during 
the development of the proposed conceptual tool. While creating the application, 
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scenarios based on terrestrial solutions were used for testing reasons and finding 
available solutions. However, the conceptual tool also envisions applications in 
extraterrestrial scenarios. 


4.1 Watchman Scenario 


The first developed scenario is the Watchman scenario. The main task of this 
scenario is to monitor a specific room and control people authorized to be in 
it. The system based on optical sensor and machine learning must be able to 
detect, recognize and perform verification of a person present in the room. The 
next step that the created system must take is to react during the detection of 
a possible intruder. During the detection of an intruder, the task of the system 
is to take a picture of a person who is not authorized to be in the room and 
record the event in the database for further verification by the administrator. 
The system must be adapted to expand the database of people authorized to 
stay in the room. 


4.2 Parkmonitor Scenario 


The second scenario is the Parkmonitor scenario. The main task of this scenario 
is to monitor a specific area and count free and occupied parking spaces. System 
using optical sensor and machine learning must be able to detect and return 
information about number of free and occupied parking spaces. 


4.3 Tracker Scenario 


The third scenario is the Tracker scenario. The main task in this scenario is 
to track a specific object by an unmanned craft (drone). For example, such an 
object can be a ball, a cyclist, a person or a car. In general, any model of object 
or objects that can be trained using neural networks suit this scenario. The 
system, based on the optical sensor and machine learning, after activating the 
function, must be able to follow a specific object until it disappears from the 
frame or the function is disabled. The drone operator must be able to activate 
and deactivate the tracking function. 


4.4 Mapping the Environment Scenario 


The fourth scenario is the Mapping the Environment scenario. The main task 
in this scenario is to build a map of the area on which the mobile robot moves. 
Created system has to enable real preview of the built map and must have the 
ability to export the created map of space/area. Moreover, the system using an 
optical sensor and machine learning has to be able to mark certain objects on 
the built map. Additionally, the system must have the possibility to choose the 
option of controlling the vehicle. The first option is to drive autonomously to 
build a map of the room. The second option after building the map is to drive 
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to a specific point on the map and the last required option is manual control of 
the vehicle. A prototype of the mobile vehicle under development is shown in 
Fig. 4. 


Fig. 4. Mobile vehicle prototype. 


4.5 Vehicle Counting 


The fifth is the scenario of Vehicle counting. The main task of this scenario is 
to build a system capable of counting vehicles. Vehicles that will be counted 
are wheeled vehicles such as: cars, trucks and motorcycles. Counting of vehicles 
has to be done separately in two directions for vehicles approaching and moving 
away from the device. Additionally, the device on which the created system will 
operate must be resistant to external weather conditions and it should be as 
small as possible. Prototype of vehicle counting device under development is 
shown in Fig. 5. After installing and testing the device, the system will transmit 
the vehicle counting results via a network to the middleware. Vehicle counting 
device prototype consists of Raspberry Pi 4, Raspberry Pi High Quality Camera, 
6mm 3MP Wide-Angle Lens and Intel Neural Compute Stick 2. 


4.6 Space Surveyer 


The sixth scenario is the space scenario of the Space Surveyer. The main task 
in this scenario is for the mobile robot to explore the terrain and determine 
the landing site of the lander. The created system must allow communication 
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Fig.5. Vehicle counting device prototype. 


between the lander and the mobile robot to exchange information while explor- 
ing the terrain. The mobile robot after exploring the terrain must be able to 
determine the best landing place and send information about it to the lander. 
The site selection will be determined by the detected obstacles, hills or hardness 
of the ground. In case the communication between the platforms fails, the mobile 
robot will be equipped with a light system which will allow the lander to start 
the landing procedure after detection. The light signal from the mobile robot 
will determine the center of the selected landing area. 


4.7 Mission to Mars 


The seventh scenario is the space scenario of Mission to Mars. The main task in 
this scenario is the autopilot function for the spacecraft. The developed system 
using optical sensors and machine learning has to be able to track the planet Mars 
and enable and disable the autopilot function. When the autopilot function is 
enabled, the system will be able to control and correct the flight to the designated 
planet. 


4.8 First Conclusions 


The above scenarios, while each different, have something in common. Namely, 
each of them contains optical sensors and machine learning. So, building one 
system with the previously described modular approach will reduce the time 
needed to build the next system. A user or even a developer involved in building 
AI systems, after introducing a conceptual tool will not have to write a con- 
secutive system from scratch. The gained time can be spent on the process of 
building and training the model, which is time consuming. 

In the first scenario, the previously mentioned QBO platform was used to 
build a watchman application capable of monitoring a room. This platform, 
thanks to the installed servos, allowed searching for objects around the room. 
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The built-in VGA cameras were used for object detection. For image process- 
ing, a programming function library was used, which is OpenCV [13]. It is a 
library mainly oriented to work with computer vision in real time. This library 
has the advantage of supporting hardware acceleration and the ability to load 
machine learning models trained on various available development platforms 
such as Tensorflow [14], PyTorch [15], Cafe [16] and many others. However, for 
proper real-time operation, the QBO platform was modified. The Raspbery Pi 
3 computer platform was replaced with a Raspberry Pi 4 and due to limited 
computing capabilities, an AI gas pedal such as Intel Neural Compute Stick 2 
was added to the computer platform. 

After the modifications, the AI application created in conjunction with the 
QBO platform enabled seamless real-time monitoring of the room and verifi- 
cation of people from the created photo database. Having built a system from 
the first scenario, it is easy to build another system, for example for the fifth 
scenario. Thanks to the modular approach to building applications, the mod- 
ule responsible for the detection of objects from the first scenario can be fully 
applied in the fifth scenario. 

It is enough to change the trained model from people to vehicles and replace 
part of the code responsible for controlling the camera with a new code created 
to count the occurring vehicles. The same is true for second scenario. A minor 
modification of the code and a change of the trained model will allow building 
a parking lot monitoring application without much effort. In the first scenario, 
after replacing the module responsible for controlling servos with a module for 
controlling DC motors, the functionality can be extended to run on a mobile 
platform without much effort. 


5 Conclusions and Further Work 


The paper presents the concept of the tool, which is still under development and 
may change during the implementation process when new problems and chal- 
lenges are discovered within the SpaceRegion project. Different scenarios based 
on terrestrial and space solutions are currently developed and implemented. 
Based on the presented scenarios, the capabilities and requirements of the pro- 
posed tool will be adjusted. We will also analyse and compare implemented code 
for different scenarios in order to find common parts, which will be eventually 
transferred to the proposed tool. A library will be created consisting of several 
sensors of the same type in order to test the behaviour of the application during 
its modification. 

Furthermore, sample applications related to artificial intelligence, robotics 
and computer vision using the conceptual tool will be created. In addition, an 
analysis of the application implementing modifications under scenario change, 
will be performed. The main goal of the proposed conceptual tool is to simplify 
the construction of AI applications. 
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Abstract. The ongoing terrifying wave of COVID-19 which is caused by the 
Severe Acute Respiratory Syndrome Coronavirus 2 or SARS-CoV-2 continues to 
affect human life and disrupt healthcare systems. Chest computed tomography 
(CT) is an effective clinical tool for estimating the patient’s severity levels and 
deciding appropriate treatment regimes. In this paper, we use a deep learning 
method for detecting COVID-19 using chest CT images with the more advanced 
balanced dataset. We used a dataset of 8054 real patient CT scans, of which 
5427 had COVID-19 and 4223 were Non-COVID-19 patient images. Our model 
had an average detection accuracy of 91.96% on the test dataset. In conclusion, 
Automated Deep Learning (DL) methodologies allow for speedy evaluation of 
CT images to detect COVID-19. 


Keywords: COVID-19 - Severe Acute Respiratory Syndrome Coronavirus 2 - 
SARS-CoV-2 - computed tomography (CT) - Deep Learning (DL) - healthcare 
system 


1 Introduction 


The ubiquitous COVID-19 pandemic is the distinguishing tragedy of the present century 
thus far. To combat this battle, clinicians and radiologists work together and invented 
a couple of different diagnostic tools. COVID-19 reportedly started in Wuhan City 
of China in December 2019 [1-3]. According to the Centers for Disease Control and 
Prevention (CDC), the total number of cases and deaths as of 8th May 2022 is 517 million 
and 6.25 million, respectively worldwide. That is why this research has a significant 
value from the social perspective as it helps faster the processes and automate the tasks 
rapidly and build more confidence in the medical fraternity. The numbers continue to 
increase with the spread of the Omicron coronavirus variant (B.1.1.529). COVID-19 is 
a disease that initially affects the respiratory organs, like the lungs [1]. As it is a highly 
transmittable viral infection without a cure, people are dying all over the world and many 
severely suffering patients do not find hospital accommodation due to overloading with 
the existing COVID-19 patients. The Critical Care facility is also overloaded and the 
situation is overwhelming. 


© The Author(s) 2023 
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The diagnosis of COVID-19 is one of the crucial aspects of the fight against this 
disease. In the early days of the pandemic, the detection was very limited and error- 
prone due to a lack of understanding of the virus genome, but researchers came up 
with the Reverse Transcription Polymerase Chain Reaction (RT-PCR) [4] pathological 
method, which is still a gold standard for the detection of COVID-19. However, many 
challenges still remain in the COVID-19 detection research. The RT-PCR sensitivity rate 
is 60%—70%, which is very low. Rapid antigen testing is also a fast method to detect 
COVID-19 patients but due to the different types of variants coming up every now and 
then it is also not so effective as lots of True negatives coming in the test due to the 
mutation of the virus genome. 

Recently, many automated machine learning approaches have been introduced in the 
literature for Covid-19 detection such as Chest CXR, Chest CT, etc. 

In this paper, we present a very efficient deep learning model for analyzing CT 
images. 


2 Literature Review 


As the situation in early 2020 had already started deteriorating and new cases were 
growing exponentially, The World Health Organization (WHO), declared it a pandemic 
on 11th March 2020 [5]. WHO distinguishes the variants of coronavirus into two cate- 
gories, one is a ‘variant of interest’ and the other one is a ‘variant of concern’. According 
to the transmissibility and detrimental effects, the variant of concern is more dangerous 
compared to the variant of interest. 

He, X. et al. [6] present a deep learning diagnosis method using chest CT images. 
As their work was reported in early April 2020, their COVID-19 chest CT image dataset 
was limited. Zhang, K. et al. [7] proposed an AI System for diagnosis, quantitative 
measurements, and prognosis of COVID-19 pneumonia using computed tomography. 
As their research was published in early June of 2020 they used the idea of image slices 
to increase the size of sample data as only 752 COVID-19 positive patient data was 
included in their work. Gunraj, H. et al. [8] proposed a tailored deep convolutional neural 
network (CNN) using chest CT images. They generated a dataset of 104,009 chest CT 
slices across 1,489 patient cases. Harmon, S. A. et al. [9] present a deep neural network 
using a multinational dataset. M. Khushi et al. [10] and Maillo, Triguero and Herrera [11] 
show that having limited positive samples in a dataset could lead to data inconsistency 
and data imbalance issues. Panwar, H. et al. [12] proposes a grad-CAM-based color 
visualization in their deep learning approach using CT images. 


3 Our Proposed Methodology 


Deep Convolutional Neural Network (CNN) is one of the most widely used neural 
networks for image classification problems [13]. CNN ina deep learning setting has been 
widely applied in the literature, for example, [14—18]. With the availability of different 
deep learning networks such as ResNet18, Res-Net50 [19], AlexNet [20], DenseNet [21], 
VGGI6 [22], EfficientNet [24], etc., one can design many automated machine learning 
models that can detect COVID-19 from CXR, CT images, or a combination of both. In 
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our approach, we designed our own classification layer but leveraged the advantages of 
EfficientNet in multi-feature generation layers through the transfer learning technique. It 
helps to generate lower-level and higher-level features in a rapid way. Iterative multiple 
training helps to optimize parameters and hyperparameters with rapid succession due to 
this integrated deep multi-layer approach. 


3.1 Dataset Description and Preprocessing 


The data consisted of chest CT images divided into two categories. COVID-19 infected 
patients and non-COVID-19 patients. We used multiple sources for data acquisition [9, 
12-15] as the opensource COVID-19 data increases over time. These images are labeled 
by licensed radiologists and freely available in open-source platforms like GITHUB 
repositories and Kaggle platforms. Once we obtained the CT images, our next step was 
to preprocess those images with proper annotation and labeling, including normaliza- 
tion and augmentation using machine learning models. We use Pytorch vison models 
[23], to flip, rotate and include those images to increase the size of the dataset, which 
basically helps the deep learning model to be more robust and accurate. The deep neu- 
ral network models derives feature information in a very different way they human 
radiologist visually inspect the images. Initially, some sort of histogram equalizing and 
contrast-enhancing. 


‘COVID’, ‘COVID", "Non-COVID". "COVIO". ‘COVID’. ‘Non-COVID’, ‘COVID’, ‘COVID’, 'Non-COVID’. ‘COVID'. 'COVID'. 'Non-COVID', ‘COVID' 


COVID'. "COVID. ‘Non-COVIO 


Fig. 1. Chest CT images after preprocessing. 


Our proposed model used a dataset of 8054 real patient CT scans, of which 5427 
had COVID-19 and 4223 were Non-COVID-19 patient images. The dataset is further 
divided into train, validation, and test sets with the ratio of 7:2:1. Figure 1 shows the 
chest CT images after preprocessing. Our dataset is very well balanced thus does not 
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incurred data imbalance issues. Data imbalance is a very common problem, especially 
happened in medical data and image analysis if the model used fewer number of positive 
sample cases than the negative sample cases. 


3.2 Methodology 


The Fig. 2. It shows the block diagram representation of our proposed model. EfficientNet 
[24], pretrained weightages helps to reduce the time of our training process as due to 
transfer learning technique of EfficientNet we used the hyperparameter values in our 
model. 


Fig. 2. Abstract Block Diagram Representation of our Proposed Model 


After preprocessing the images we used EfficientNet [24], a pretrained Model, for 
the feature generation tasks and then we add a deep classification layer to classify the 
CT images into COVID-19 position images and COVID-19 negative images, i.e. Non- 
COVID-19 images. After execution of the feature generation tasks, the total number of 
parameters of our dataset is 18,830,011, out of which Trainable parameters are 1,281,395 
and Non-trainable parameters are 17,548,616. 

The architecture of EfficientNet baseline Neural Network is shown in Fig. 3. 
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Fig. 3. EfficientNet BO Baseline neural network Architecture 
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EfficientNet is a very efficient high-performance convolutional neural network archi- 
tecture and scaling method that uniformly scales all dimensions of depth, width and res- 
olution using a compound coefficient. Where benchmark deep neural models arbitrarily 
scale these factors, the EfficientNet scaling method uniformly scales network width, 
depth, and resolution with a set of fixed scaling coefficients. The effectiveness of model 
scaling is based on the baseline network which is playing a key role in high performance. 
The baseline network builds up with a deep neural architecture search using the AutoML 
MNAS framework [25], which optimizes both accuracy and efficiency. After leveraging 
the potential of transfer learning by using the deep learning model EfficientNet in the 
feature generation architecture we design our deep learning classification architecture. 
The elements of the architecture is given in the Fig. 4. We use piecewise linear function 
rectified linear unit in our model as an activation function instead of Softmax or other 
activation function as we are interested in getting probabilistic value from 0 to 1. 


=2 


=(1,1) 


lormalization 


3x3, Stride = 1x1, 
1x1 


padding 
180000, Output 


(1,1), padding 


i 
v 
> 
o 
= 
= 
o 
= 
= | 
[e] 
> 
e 
le} 
U 


atch N 


Convolution Layer, 12x20 


Kernel Size 
Fully Connected layer 


3x12, Kernel Size = (3,3), 
Input Feature 


Stride 


Fig. 4. Classification Architecture of our Deep Learning Model 


The classification layers provide the percentage of the score for COVID-19 patients 
and Non-COVID-19 patients. 


4 Results 


The proposed deep learning model had an average detection accuracy of 91.96% on the 
test CT images. Sensitivity is 92.24, Specificity is 93.01 and F1 Score is 0.90. Figure 5 
shows the Loss Curve. We used three fold cross validation technique and after the 20 
epoch as the values converge to the optimum values so no further training of the model 
needed, We heuristically checked this is the optimum level of the loss in our experiment 
while running with many epoch heuristically to understand the optimum level. Hence, 
we only included the optimum value figures upto 20 epoch. From the graph, in Fig. Sa, it 
is evident the loss decreased over the epoch. Figure 5.b shows the accuracy curve. From 
the graph, it is evident that the accuracy improves over the epoch. 

Figure 6a, b, c are the results coming from an unknown test sample and then matching 
their value for the predicted score obtained using our proposed model. Itis clearly evident 
from the Fig. 6 that the predicted score clearly confirmed the type of the class with very 
high accuracy. 
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Fig. 6. Predicted score matching with the ground truth values for different test images. 


The following Table 1 presents the comparison of the proposed model with other 


existing models using our dataset with the model they described in their papers. 


Table 1. Comparison Analysis with Existing Models 


Model name Accuracy Sensitivity Specificity F1 score 
Proposed Model 91.96 92.24 93.01 0.90 
He, X. et al. [6] 86.45 87.02 87.54 0.86 
Gunraj, H. et al. [7] 87.21 88.82 87.98 0.87 
Zhang, K. et al. [10] 88.67 89.52 89.87 0.88 
Harmon, S. A. et al. [11] 87.36 89.54 89.98 0.89 
Panwar, H. et al. [12] 90.14 89.85 90.51 0.88 
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5 Future Work and Conclusion 


The Deep learning Model is not intended to replace the RT-PCR or other bio-chemist 
methods rather it would help to give extra confidence to the doctors to detect COVID- 
19 with more accuracy, hence helps to reduce wrong diagnosis. Many earlier models 
reported in the literature utilized a limited number of COVID-19 positive images due 
to the lack of availability of such data during the primary phases of the pandemic. 
In the present work, 8054 CT images were used with balanced positive and negative 
cases. The results show that this work which is based on EfficientNet is capable of 
detecting COVID-19 based on CT images with high accuracy. This adds to the growing 
body of evidence that deep learning techniques may be used in a clinical setting with 
confidence for detecting COVID-19. One of the recommendations of WHO is to increase 
testing for COVID-19. Usage of deep learning-based methods would bring down the cost 
and accelerate the diagnosis process which will enable rapid testing on a global scale. 
In our future work, we would investigate vision transformer models with this dataset. 
Transformer architecture performs very well in Natural Language Processing (NLP) 
tasks and is widely used in NLP domains. But recently many research works comes up 
with vision applications as well, though Deep Convolutional Neural Network is mostly 
preferred by most of the researchers in computer vision applications, especially in image 
processing domains, object detection domains, etc. We also like to extend our work on 
our classification layers with more variety of activation functions and loss functions. 
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Abstract. Artificial intelligence (AI) in prostate MRI analysis shows 
great promise and impressive performance. A large number of studies 
present the usefulness of AI models in tasks such as prostate segmenta- 
tion, lesion detection, and the classification and stratification of a can- 
cer’s aggressiveness. This article presents a subjective critical review of 
AI in prostate MRI analysis. It discusses both the technology’s cur- 
rent state and its most recent advances, as well as its challenges. The 
article then presents opportunities in the context of ongoing research, 
which possesses the potential to reduce bias and to be applied in clinical 
settings. 


Keywords: artificial intelligence - deep learning - image analysis - 
magnetic resonance imaging * prostate cancer 


1 Introduction 


Prostate cancer (PCa) is the second most common cancer in men, with almost 
1.4 million new cases diagnosed per year worldwide [22]. With the accelera- 
tion of the industrialization process and the impact of environmental pollution, 
the incidence of PCa—caused by enriched foods, smoking, and excessive alcohol 
use—continues to increase at a rate of 6.63% per year. Early detection of PCa can 
improve patients’ prognoses considerably. Recent advances in MRI technology 
that allow both anatomical and functional imaging to be performed simultane- 
ously, mpMRI, have improved our ability to detect and characterise prostate 
tumors [16]. According to patient management guidelines, noninvasive diagnos- 
tic tools such as mpMRI play an important role in the referral of patients to 
© The Author(s) 2023 
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active surveillance, watchful waiting, and active treatment [15,24]. The Prostate 
Imaging Reporting and Data System (PI-RADS) was introduced by the Euro- 
pean Society of Urogenital Radiology (ESUR) in 2012 to standardize prostate 
mpMRI examination protocols and the reporting of suspicious lesions (providing 
standardized terminology and sector map-based locations). The PI-RADS sys- 
tem categorizes prostate lesions based on the likelihood of cancer according to a 
five-point scale. PI-RADS was developed by a consensus-based process that uses 
a combination of published data, and expert observations and opinions [20]. The 
clinical utility of PI-RADS scoring is growing, and several studies have confirmed 
that the system improves the diagnostic accuracy of mpMRI [12]. 

The definitive diagnosis of PCa depends on the recognition of cancer cells in 
a tissue biopsy. Based on histological tumor architecture, a Gleason classification 
system was proposed. The Gleason score of biopsy-detected PCa comprises the 
Gleason grade of the most extensive (primary) pattern, plus that of the second 
most common (secondary) pattern, and ranges from one to five [11]. The “clin- 
ically significant” (csPCa) descriptor is used widely to differentiate PCa types 
that cause morbidity or death from those that do not. Defining what is clinically 
significant and what is insignificant PCa (iPCa) is challenging. According to 
the literature, iPCas do not typically cause harm and are at high risk of being 
overtreated, with the treatment itself risking harmful side effects to patients [4]. 

In recent years, a large number of review articles that concern the applica- 
tion of artificial intelligence (AI) in prostate cancer diagnosis have been published 
(7-9,17,21]. They discuss various aspects of AI application in PCa, which con- 
cern not only to mpMRI image analysis, but also ultrasound image analysis, 
histopathology image analysis, MRI-ultrasound fusion, MRI-histopathology reg- 
istration, and clinical outcome predictions. In 2022 alone various different review 
articles were published. Li et al. [13] presented an extensive review over a long 
period that studied the applications of machine learning (ML) and deep learning 
(DL) in prostate MRI segmentation, registration, lesion detection and scoring, 
and treatment decision support. Sushentsev et al. [19] analyzed two classes of AI 
method: DL and traditional machine learning (TML), demonstrating their com- 
parable performance in the differentiation of csPCa/iPCa, as well as discovering 
common methodological limitations. According to the authors, consensus on 
datasets, segmentation, ground truth assessment, and model evaluation remains 
to be established. The narrative review of Belue and Turkbey [5] introduced 
emerging medical imaging AI paper quality metrics, such as the Checklist for 
Artificial Intelligence in Medical Imaging (CLAIM) and Field-Weighted Citation 
Impact (FWCI), and applied those analyses to the top AI models for segmen- 
tation, detection, and classification of PCa—including potential areas of impact 
in radiologists’ workflow. Although those methods are commonly reported in 
the literature with promising results, the authors concluded that prospective 
multicenter studies are necessary to determine the impact of AI on improving 
radiologists’ performance. Sunogrot et al. [18] provided an interesting review 
that focuses on open datasets, commercially/publicly available AI, and grand 
challenges. The authors concluded that well curated public datasets are avail- 
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able, but are relatively small and vary in quality. Computational AI challenges 
are needed to deliver independent validation and to build trust in AI for prostate 
MRI. 

This article presents a subjective critical review of AI in prostate MRI anal- 
ysis. It considers the most recent advances, challenges, and opportunities pre- 
sented in the context of the ongoing project, Al-augmented radiology - detection, 
reporting and clinical decision making in prostate cancer diagnosis (INFOS- 
TRATEG) being conducted at the Laboratory of Applied Artificial Intelligence 
of the National Information Processing Institute in Poland. 


2 The Potential of Artificial Intelligence in MpMRI 
Analysis 


Current clinical practice and guidelines utilize mpMRI prior to biopsies to iden- 
tify potentially suspicious lesions. Radiological interpretation, together with rel- 
evant clinical information, supports clinicians in proper patient management, 
which remains crucial in light of the high prevalence of PCa and its low mortal- 
ity rate. Many patients with no cancer or with indolent cancer can benefit from 
long-term active surveillance and lower numbers of unnecessary biopsies; this, in 
turn, minimizes the occurrence of unnecessary side-effects, such as pain, bleeding, 
and infection. Despite continuous improvement in MRI technique, interpreta- 
tion of prostate MRI remains challenging and is generally recognized to present 
a steep learning curve [13]. Low specificity and high interobserver variability 
remain problematic disadvantages of MRI—particularly for nondedicated or less 
experienced radiologists, who have received only short term training in prostate 
MRI [23]. At present the processing and interpretation of prostate mpMRI data 
in clinical routine is performed chiefly by human experts; it remains highly sub- 
jective and strongly dependent on experience and training. Improving the PCa 
diagnostic pathway (and potentially reducing overdiagnosis of iPCa and under- 
diagnosis of csPCa) is a key challenge. AI techniques may support the radio- 
logical workflow of PCa diagnosis and reduce interobserver variability among 
radiologists. This enables more consistent diagnoses by clinicians, which can 
result in improved patient outcomes. AI models utilize the quantitative nature 
of imaging data to construct a more robust feature space based on mpMRI repre- 
sentation. AI models can help in cancer diagnosis by facilitating ancillary tasks 
in cancer detection that are labor- and experience intensive, such as prostate 
gland segmentation, PCa detection on mpMRI images, and characterization of 
a cancer’s advancement and aggressiveness [6]. 


2.1 Prostate Segmentation 


Prostate gland segmentation aims to outline the whole gland boundary, as well 
as its zonal division. This is critical for the calculation of the entire prostate 
volume and of the serum prostate-specific antigen density, which are important 
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PCa biomarkers. Manual segmentation of the prostate and its zones is a time- 
consuming and tedious task. It is also highly subjective and dependent on the 
experience of the radiologist. Prostate gland segmentation is used commonly in 
clinical practise estimation of prostate volume based on the ellipsoid formula, as 
it is easy to apply, highly time-efficient, and is characterized by high interobserver 
agreement; however, it offers only an approximation of a reality, which, in many 
cases, is much more complex. AI/DL methods have high potential to reduce the 
time and variability associated with prostate gland segmentation on MRI. 

In [3], the authors propose a segmentation pipeline that comprises three con- 
volutional neural networks. The first localizes the prostate by creating a bound- 
ing box. The second completes prostate gland segmentation by classifying each 
pixel as belonging either to the prostate or to the background. The third differ- 
entiates between the transition zone and the peripheral zone by classifying every 
voxel in the image as one of these two classes. Each of the convolutional neural 
networks was implemented using a customized hybrid three-dimensional/two- 
dimensional (3D/2D) U-Net architecture. The model achieved mean Dice scores 
for segmentation of 0.940 for the whole prostate, 0.914 for the transition zone, 
and 0.776 for the peripheral zone. Recently a comparison of three standard DL 
architectures for prostate segmentation was proposed in [10]. UNet, an efficient 
neural network (ENet), and an efficient residual factorized ConvNet (ERFNet) 
were trained and tuned on the PROSTATEx public dataset to segment the 
whole gland and the transition zone separately (the peripheral zone masks were 
obtained by subtraction). The top result was achieved by ENet: 91% for the 
whole gland, 87% for the transition zone, and 71% for the peripheral zone. 


2.2 Prostate Lesion Detection and Characterization 


Identifying and characterizing csPCa is a crucial component of proper treat- 
ment planning. The probability of csPCa can be assessed radiologically based 
on PI-RADS (although even when using the current version, v2.1, considerable 
inter-/intrareader disagreement is observed frequently. AI/DL methods have the 
potential to become common tools for differentiation between csPCa and icsPCa, 
and for assessment of the locations and extents of aggressive cancers. 
Typically, AI models are subdivided into two groups of methods, with regard 
to the nature of the input data and the expected result of the analysis. The 
first group focuses on lesion detection, uses whole MRI images for analysis and 
provides pixel-level output, as well as extracting regions with probable csPCa 
and icsPCa. The pixel-level analysis provides pixel-level probability maps of 
cancer distribution. This produces patient-level predictions of suspicious areas 
automatically. Although Al-based detection models are typically of the two-class 
variety (csPCa versus non-csPCa), multiclass lesion detection models—in which 
detection results relate to different grading systems like histopathological Inter- 
national Society of Urological Pathology (ISUP) score to express the aggressive- 
ness of csPCa, or radiological PI-RADS score—are also viable. The second group 
of models, dedicated purely to lesion classification, assumes radiologist-outlined 
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lesions (regions of interest) as inputs, which are then sorted into different cat- 
egories. Some two-class or multiclass models aim to automate stratification at 
lesion level. 

Studies reported in the literature demonstrate detection models that range 
between 75% to 85% accuracy; however, the methodology and study population 
are highly diverse. Nevertheless, it can be observed that the results reported 
fall within the range of reported radiologist performance [8]. A key challenge 
for fully automated detection algorithms is the number of false positive lesions. 
Mehralivand et al. [14] presented a new, fully automated, DL-based PCa detec- 
tion system for prostate MRI using a large scale, diverse, expert annotated train- 
ing dataset. Although the approach achieved reasonable performance metrics, 
an average of 0.8—1.9 false positive lesions per patient were reported. Multiclass 
lesion classifications are more varied due to their diverse cohort sizes and method- 
ology. When classifying according to ISUP grade, mean AUC per classification 
category can range significantly—even for the same category. In a review study 
conducted by Twilt et al. [21], ISUP 3 category mean AUC ranged between 0.379 
and 0.96. 


3 Urgent Challenges 


3.1 Datasets 


The AI community continues to wait for extensive and well annotated datasets. 
In PCa research most datasets are small (often derving from a single institution) 
that are homogeneous in acquisition protocols and scanner manufacturers. The 
development of robust and unbiased AI models requires large, heterogeneous, and 
reliably annotated datasets, which reflects the variability of cancer’s appearance 
and the diversity of equipment manufacturers. Labels are typically created by a 
single expert; however, intraobserver variability exists even between experienced 
radiologists in the assessment of cancer extent on different mpMRI modalities 
and in the selection of individual lesions features. Such differences continue to 
be observed, despite the introduction of the PI-RADS standard. Dataset label- 
ing should be performed during multireader studies, including interdisciplinary 
discussions and panels, to reduce bias in labeling. 


3.2 Defining Ground Truth 


“Ground truth” refers to the labels that are assigned to expert annotations. 
Radiological delineations without histopathology confirmations are severely lim- 
ited due to the high risk of missing cancer foci that were not identified by a 
radiologist. In PCa the most reliable validation is based on retrospective identi- 
fication of cancerous regions on whole-mount radical prostatectomy specimens, 
which can be projected onto an mpMRI scan. Although histopathology informa- 
tion is evident, this method requires advanced MRI-histopathology registration. 
Moreover, manual pathologists’ annotations are even more time consuming than 
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radiological delineations, which limits the possibility of creating large datasets. 
Prostatectomy is typically performed on cancer at advanced stages, which limits 
the possibility of including cases of low risk or indolent cancer. The alternative 
approach assumes pathological information from biopsies. The use of systematic 
biopsy is limited due to the random sampling procedure and the range being lim- 
ited mainly to the peripheral zone, which might entail the missing or undersam- 
pling of some PCa foci, and lead to underdiagnosis. The best solution assumes 
pathological confirmation from target biopsies, implemented under a fusion of 
MRI and TRUS-ultrasonography. PCa candidates are typically selected by a 
prebiopsy MRI in which a radiologist highlights PI-RADS 3 or above lesions. 
Much attention should be paid to accurate mpMRI analysis, which, in the case 
of database creation might relate to multireader verification of visible lesions 
and verification with biopsy history. 


3.8 Different Evaluation Criteria 


Comparison of different AI models is quite problematic—not only due to the 
diversity of the databases used and the various cohort sizes, but also in context of 
the lack of standardized evaluation criteria. In the case of PCa, which progresses 
gradually, using clinical endpoints in the form of patient outcomes like death or 
recurrence is mostly unviable. This underlines the importance of defining suitable 
benchmarks and verification criteria for model evaluation. Organization of grand 
challenges and open databases might improve the validation and benchmarking 
of models. 


3.4 Limited Multireader Studies and Prospective Evaluation 


As well as comparisons between AI models, attention should be directed toward 
multireader studies that compare the efficiency of AI models with that of radi- 
ologists or clinicians. Such comparisons may better indicate the potential of AI 
models—particularly in the support or training of less experienced specialists. 
Finally, a prospective evaluation in a controlled medical environment should be 
performed to open the door for clinical deployment, preceded by clinical trials. 


4 Future Directions 


We observe the development of AI models in PCa diagnosis and their increas- 
ing effectiveness every day. We are moving, incrementally, toward personalized 
medicine, and exploring the potential of radiomics and domain knowledge, while 
attempting to overcome the existing limitations and challenges. 

A research group at the Laboratory of Applied Artificial Intelligence of 
the National Information Processing Institute is conducting the AI4AR PCal1] 
project, which is dedicated to the analysis of mpMRI images for PCa diagnosis. 
During the initial phase reliable, well annotated databases of mpMRI examina- 
tions are being compiled. The researchers aim to collect between 400 and 600 
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cases with full clinical descriptions. All cases are annotated by three experi- 
enced radiologists who have over five years of experience in describing mpMRI 
examinations and who possess practical knowledge of the PI-RADS standard. 
Radiologists analyze mpMRI independently, without access to historical infor- 
mation. All cases are later analyzed by an interdisciplinary team of researchers, 
radiologists, and clinicians to reduce bias and to confirm lesions’ locations and 
extent. Ground truth is based on MRI-ultrasound fusion guided targeted prostate 
biopsy. Only cases confirmed histopathologically, with verified lesion locations 
that correlate with historical biopsy data, are treated as reliable and are included 
in the database. 

Simultaneously a novel structured reporting system is under development as 
a flexible environment for the structured and standardized reporting of radiolog- 
ical image data. The system possesses a modular architecture and is integrated 
with the XNAT|2] imaging informatics platform, which facilitates common man- 
agement, storage and quality assurance tasks for imaging and associated data. A 
dedicated module for prostate mpMRI structured reporting was proposed, which 
is standardized according to the PI-RADS radiological lexicon. The structured 
report scheme is also used in the data labeling process; all cases in the database 
possess not only visual labels, but also structured ones, which contributes to the 
uniqueness of the proposed dataset. In the second phase, the database will be 
extended by inclusion of cases from other medical centers to increase its hetero- 
geneity regarding different equipment manufacturers and acquisition protocols. 

The final implementation of the system is planned in the form of a radiological 
educational platform that is dedicated to learning the structural reporting of 
prostate mpMRI examinations, and considers various forms of support by AI 
algorithms. Verification of the platform in the third phase of the project assumes 
the conductance of multireader studies to assess the effectiveness and impact 
of AI models on the quality and accuracy of the reporting process. A study 
of the behavior of platform users will allow us to assess the potential of AI 
for less experienced radiologists, and will standardize or improve human reader 
performance. 


5 Conclusions 


AI in prostate MRI analysis shows great promise and impressive performance 
that is comparable to that of human experts. Overcoming the technology’s limi- 
tations and demonstrating its clinical effectiveness will unlock opportunities for 
clinical deployment in the form of educational systems, reporting and patient 
management support systems, and “second reader” or patient triage systems. 
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Abstract. In recent years multiple deep-learning solutions have 
emerged that aim to assist radiologists in prostate cancer (PCa) diagno- 
sis. Most of the studies however do not compare the diagnostic accuracy 
of the developed models to that of radiology specialists but simply report 
the model performance on the reference datasets. This makes it hard to 
infer the potential benefits and applicability of proposed methods in diag- 
nostic workflows. In this paper, we investigate the effects of using pre- 
trained models in the differentiation of clinically significant PCa (csPCa) 
on mpMRI and report the results of conducted multi-reader multi-case 
pilot study involving human experts. The study aims to compare the per- 
formance of deep learning models with six radiologists varying in diag- 
nostic experience. A subset of the ProstateX Challenge dataset counting 
32 prostate lesions was used to evaluate the diagnostic accuracy of mod- 
els and human raters using ROC analysis. Deep neural networks were 
found to achieve comparable performance to experienced readers in the 
diagnosis of csPCa. Results confirm the potential of deep neural networks 
in enhancing the cognitive abilities of radiologists in PCa assessment. 


Keywords: Deep learning * Prostate Cancer - Computer Aided 
Diagnosis 


1 Introduction 


In light of the increasing incidence rate of prostate cancer (PCa) over the 
previous years [2], there is a global focus on providing modern solutions that 
can address this growing health issue. Noninvasive diagnostics based on multi- 
parametric magnetic resonance imaging (mpMRI) became essential in clinical 
decision-making as it enables more accurate risk stratification and therefore plays 
an important role in selecting patients for biopsy and direct targeting of lesions 
[6,11]. 

Radiological assessment of the prostate gland involves the interpretation 
and reporting of mpMRI examinations according to the established global stan- 
dards. The current version of the standardized prostate MRI assessment Prostate 
Imaging-Reporting and Data System (PI-RADS v2.1) [8], provides an approach 
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to the interpretation and reporting of PCa examinations. The system assumes 
the evaluation of each mpMRI sequence separately. Each lesion assessment cat- 
egory is established on a 5-point scale according to the assessment algorithm 
involving previously scored sequences. Introducing the standard in diagnos- 
tic practice improved the diagnostic accuracy of performed examinations and 
improved the availability of the method. Low specificity, however, remains a 
considerable aspect of MRI assessment and clinically significant (cs) PCa differ- 
entiation, potentially leading to unnecessary biopsies. Because of the assessment 
complexity and steep learning curve, it is mainly the case for radiologists with 
low experience in prostate MRI reporting [10]. 

Recently, solutions based on machine learning have achieved promising 
results in applications in PCa diagnostics. A PCa classification challenge held in 
2017 (ProstateX) provided a way of comparing tools of automatic PCa differen- 
tiation based on mpMRI [1]. 71 competing methods were evaluated on the lesion 
classification task. The area under the receiver operating characteristic curve 
(AUC) of submitted models ranged between 0.45 to 0.87 AUC. The top three 
scoring teams achieved results of AUC = 0.84 and 0.87. We used the ProstateX 
dataset to develop and validate the deep convolutional neural network (CNN) 
model that achieved AUC = 0.84 on the test ProstateX dataset [7]. 

Narrative Review by Twilt et. al. [9] presents an overview of recently pro- 
posed tools (between 2018 and 2022) that have been suggested to aid in the 
diagnosis of PCa. Overall deep learning (DL) solutions achieve the highest per- 
formance on PCa detection and diagnosis tasks. Computational models show 
the potential in enhancing the diagnostic processes and increasing the specificity 
of mpMRI assessment. However, only a limited number of studies validated the 
results in clinical workflows - 85% of them report only stand-alone model diag- 
nostic accuracy [9]. It remains a question of how the diagnostic accuracy of DL 
solutions relates to that of radiology experts and what could be expected from 
the integration of computational models in diagnostic workflows. 

The objective of our study was to evaluate the diagnostic accuracy of radiol- 
ogists with various levels of diagnostic experience in comparison to the proposed 
DL solution for csPCa differentiation. 


2 Methods 


The retrospective study design involved the assessment of 32 suspicious lesions by 
six radiologists and the deep CNN model in a multi-case multi-reader (MCMR) 
setting. 


2.1 Dataset 


A group of cases from a publicly available database of annotated mpMRI data 
were used in the study [3]. Complete mpMRI data (T2W, DCE, DWI, and ADC 
sequences) were included for all cases in the database. We have selected a thirty- 
two lesion dataset diversified according to its clinical significance based on the 
results of a histopathological evaluation. 
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The selected dataset contained: 


— 14 PZ lesions (7 cs and 7 not cs), 
— 11 TZ lesions (5 cs and 6 not cs), 
— 7 AS lesions (4 cs and 3 not cs). 


2.2 Radiological Assessment Study 


The study was carried out by a group of specialists with diversified expertise. 
Six radiologists involved in the study had practical experience in PCa diagnosis 
based on the mpMRI (three specialists with diagnostic experience of one to five 
years; three specialists with more than ten years of diagnostic experience, and 
at least five years of experience using the PI-RADS standard). Those groups 
of experts are referred to in the paper as experienced and inexperienced raters. 
The participating experts did not interact with each other during the assessment 
phase. 


Assigned PI-RADS Category 
Zone 
25. EM PZ 
mm AS 
mm TZ 


20 


PI-RADS Category 


Fig. 1. PI-RADS score evaluations for lesions in the dataset. Only three assessments 
assigned lesions to the PIL-RADS 1 category (only PZ lesions). 


Experts participating in the study were not involved in the dataset selection, 
development of the study methodology, and experiment results analysis. 

The results of the assessments are presented in the Fig. 1. Even though the 
dataset was balanced (close to an equal rate of cs and non-cs lesions) the distri- 
bution of assigned scores did not reflect that. 


2.3 Deep CNN Model 


The model evaluated in this study was a multi-modal deep CNN network 
of VGG-inspired architecture, adapted to the input sequence resolution and 
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problem complexity. Introduced modifications reduced the number of trainable 
parameters. The developed model architecture design reflected a PI-RADS cat- 
egory assessment algorithm based on lesion zonal location. This was done by 
integration of output routing and using complex loss function for optimization. 
Resulting predictions were assigned using two subnetworks designed to base 
predictions on T2W and DWI (subnetwork for TZ, AS, and SV lesions) and on 
DWI/ADC and DCE (subnetwork for PZ lesions). The model has been evaluated 
on the test reference dataset and resulted in a score corresponding to the state- 
of-the-art results (AUC = 0.84) [1]. Model architecture and analysis of achieved 
diagnostic accuracy have been described in the previously published study|7]. 

CNN predictions were made on unseen samples using 5-fold cross-validation 
and collecting validation split classification results. 


2.4 Probability Mapping 
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Fig. 2. The left figure shows the results of a mapping of raw CNN predictions to 
bins corresponding to PI-RADS categories. Separate raw predictions for PZ (raw_pz) 
and TZ and AS lesions (raw_tz-_as) show different response characteristics resulting 
from separate network optimization processes. The right figure presents the empirical 
cumulative distribution function (ECDF) of bins in relation to normalized CNN output. 


It could be argued that continuous predictions resulting from softmax output 
layers can produce superior AUC results in comparison to the ordinal estimations 
made by human experts using the Likert scale. Therefore, to further evaluate 
the diagnostic characteristics of the proposed model, we have mapped the raw 
continuous CNN predictions to bins corresponding to PI-RADS scores (Fig. 2). 
Bin discretization involved several steps that mapped the raw CNN predictions 
to ordinal categories. We have used the mode of PI-RADS assessments for lesions 
resulting from a radiological assessment study as ground truth for labels. 
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First, continuous outputs resulting from PZ and TZ/AS sub-networks were 
normalized to the range of [0,1]. Leave-one-out cross-validated estimates of bins 
were obtained for each lesion using two ordinal regression models|4] (separate 
each subnetwork predictions). This resulted in the mapping of continuous net- 
work output to the Likert scale that reflected the PI-RADS category character- 
istics based on the CNN prediction (Fig. 2). 

We have mapped the 5-point Likert scale of manually assigned PI-RADS 
categories and automatically estimated bins to the [0, 0.25, 0.5, 0.75, and 1] 
probability values to perform the ROC analysis. 


2.5 Statistical Analysis 


We compare the model performance with that of experienced and inexperienced 
radiology specialists using assessments collected during the retrospective study. 
To evaluate the differences, we have used the Area under the Receiver operating 
characteristic Curve (AUC) as a measure of diagnostic accuracy. Extensive simu- 
lations using bootstrap re-sampling (1000 tests) [5] were conducted to construct 
95% confidence intervals and perform hypothesis testing in various scenarios. We 
have conducted separate experiments to compare the diagnostic characteristics 
in relation to the combinations of assessment methods (CNN, human raters), 
lesion location (PZ, TZ, and AS), and examinator experience. An alpha of 0.05 
was used as the cutoff for statistical significance (we additionally report test 
results with an alpha of 0.1 due to the small dataset sample size). 


3 Results 


The following section presents the results of the comparison of diagnostic accu- 
racy between inexperienced, experienced radiologists and model predictions. 
Additionally, we report the change in diagnostic accuracy resulting from the 
integration of human and CNN assessments. 


3.1 Results of Raw CNN Predictions 


The results achieved by the CNN model (AUC = 0.83, CI [0.80, 0.88]) demon- 
strated superior diagnostic accuracy in comparison with both: 


experienced (AUC = 0.80, p > .1, CI [0.74, 0.86]) and 
inexperienced (AUC = 0.71, p < .1, CI [0.63, 0.80]) 


specialists in the evaluation of lesions’ clinical significance using the PLRADS 
v2.1 standard. 

The lowest diagnostic accuracy has been observed for AS lesions, where CNN 
solution provides higher quality estimations in comparison both to experienced 
and inexperienced radiologists (AUC = 0.79 vs. AUC = 0.64 vs. AUC = 0.59). In 
the case of other lesion locations, the differences between the neural network and 
the experienced radiology specialists were less pronounced: PZ (AUC = 0.88 vs. 
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AUC = 0.85 vs. AUC = 0.72) and TZ (AUC = 0.89 vs. AUC = 0.84 vs. AUC 
= 0.78). Differences in diagnostic accuracy dependent on lesion location were 
however not statistically significant. 

Following analyses were performed using binned CNN predictions. 


3.2 CNN Performance Compared to Human Raters 


Comparison of diagnostic accuracy (AUC) 
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Fig. 3. Diagnostic accuracy of inexperienced, experienced (left), and all (right) assess- 
ments in comparison to the CNN predictions, expressed in established AUC values and 
95% confidence intervals. 


The Fig. 3 presents the results of a comparison of diagnostic accuracy measured 
in AUC between inexperienced and experienced raters in comparison to model 
predictions restricted to ordinal categories. Overall, CNN achieved superior (p 
< .1) diagnostic accuracy (AUC = 0.83, CI [0.79, 0.87]) in comparison to all 
(AUC = 0.76, CI [0.70, 0.81]) and inexperienced (AUC = 0.72, CI [0.63, 0.80]) 
rater assessments. 

Differences were statistically significant for PZ lesion evaluation when con- 
sidering assessments of all (CNN AUC = 0.90 vs AUC = 0.78, p < .05) and 
inexperienced raters (AUC = 0.71, p < .05). There were no statistically signifi- 
cant differences found in diagnostic accuracy between assessments of experienced 
raters and CNN predictions. 


3.3 Diagnostic Accuracy of Combined Assessment 


To investigate the potential change in diagnostic accuracy by integration of 
computer-aided assessment we analyzed the potential results of combining 
human and automatic predictions. Integrated predictions were obtained by com- 
puting average expert and binned CNN predictions on the lesion level and map- 
ping those back to the Likert scale. This allowed investigation of the potential 
gain in diagnostic accuracy in computer-aided PCa diagnosis. 
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Fig. 4. Diagnostic accuracy for assessments of inexperienced and experienced radiolo- 
gists compared to combined predictions expressed in established AUC values and 95% 
confidence intervals. 


A positive diagnostic accuracy change has been observed after combining the 
model predictions with expert assessments in all tested settings (Fig. 4). Integra- 
tion of CNN with rater predictions resulted in an overall increase of diagnostic 
accuracy by 0.09 AUC (CNN+rater AUC = 0.85, p < .05, CI [0.80, 0.89]) 


4 Discussion and Conclusion 


In this study, we investigated the performance of the deep CNN model for PCa 
diagnosis on mpMRI data in comparison to human raters in an MRMC study 
setting on a subset of the reference dataset. 

The results suggest that the proposed model outperformed inexperienced 
radiologists and achieved diagnostic accuracy similar to that of experienced 
raters. The achieved results are promising, yet decisive conclusions cannot be 
drawn confidently given the study design and small sample size used for valida- 
tion. 

Our study had several limitations. First of all, the dataset size used for valida- 
tion in the conducted study was limited by the availability of readers. The study 
involved a substantial number of readers, however, the analysis has been per- 
formed in subgroups defined based on radiologist experience, which limited the 
number of assessments considered in hypotheses testing. The modest sample size 
resulted in wide 95% CIs constructed using bootstrap simulations and therefore 
affected the power of performed statistical tests. Furthermore, the study design 
was far from the clinical setting and based on the evaluation of selected single 
lesions. Finally, we could not evaluate the stability of the model performance on 
external data. 

Although promising, results need confirmation in further, more extensive 
studies. 
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Abstract. We explored unconditional and conditional Generative Adversarial 
Networks (GANS) in centralized and decentralized settings. The centralized set- 
ting imitates studies on large but highly unbalanced skin lesion dataset, while the 
decentralized one simulates a more realistic hospital scenario with three insti- 
tutions. We evaluated models’ performance in terms of fidelity, diversity, speed 
of training, and predictive ability of classifiers trained on the generated synthetic 
data. In addition, we provided explainability focused on both global and local fea- 
tures. Calculated distance between real images and their projections in the latent 
space proved the authenticity of generated samples, which is one of the main 
concerns in this type of applications. The code for studies is publicly available 
(https://github.com/aidotse/stylegan2-ada-pytorch). 


Keywords: GAN - federated learning * skin lesion classification - XAI 


1 Introduction 


In recent years, the use of neural networks has become a very popular and attractive 
topic for many medical researches [7,9,17], as one of the key promises of using Arti- 
ficial Intelligence (AI) in healthcare is its potential to improve diagnosis. However, 
to create reliable deep learning (DL) algorithms that can identify complex patterns of 
medical conditions, they must be trained on a large amount of data. In addition, it is 
desirable for the model to have a diverse range of cases, as data from a single source 
may be biased by the acquisition protocol or the population [6, 20]. 

Unfortunately, preparation and annotation of medical data is a costly procedure that 
demands the assistance of medical specialists. Additionally, access to medical data 
requires a lengthy approval process due to patient privacy concerns. This makes it 
almost impossible for different institutions to share data and thus expertise with one 
another. Although there are some high quality open access dataset initiatives [9,17], 
there is still a great need for much more diverse and complex databases to effectively 
apply DL. 

Synthetic data appears to be a good solution to mitigate the issues with privacy poli- 
cies. It can be used in two ways - firstly as extensions of small and unbalanced datasets 
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(e.g., of rare diseases) and secondly for anonymization purposes (to replace instead of 
augment real samples). In both scenarios, synthetic medical data must accomplish two 
competing goals. The data should accurately reflect the real data and simultaneously 
offer strong privacy protection for the individuals whose records were used to create it. 

In this work we perform a detailed study of GAN-based artificial data generation in 
the case of International Skin Imaging Collaboration (ISIC) 2020 [17] database using 
StyleGAN2-ADA [11] in conditional and unconditional settings. All trained models 
are evaluated in terms of both fidelity and diversity. Furthermore, we conduct an exten- 
sive latent space analysis of the generated images to better understand the structure 
of the real and synthetic images for the subsequent binary classification task (benign 
and malignant). Performed evaluations base on image editing in latent space, local and 
global explanations of trained classifiers. As far as we know, such detailed analysis has 
not been attempted before. 

Moreover, to deal with a more realistic scenario where a single hospital does not 
have a sufficiently large dataset to generate artificial data, we simulate a scenario with 
three hospitals with a different amount of data each. We propose to use Federated 
Learning (FL) [16] with the aim to synthesise a more complex, fair and diverse dataset 
through the collaboration of multiple medical institutions without exchanging local data 
samples. 


2 Materials and Methods 


2.1 International Skin Imaging Collaboration Database 


In our experiments, the reference dataset for real images is based on the training set of 
the ISIC 2020 challenge [17] extended by malignant cases from previous years’ compe- 
titions [15]. The database consists of the 37 648 images — the whole ISIC 2020 dataset, 
adding 4522 malignant samples from ISIC 2019 — where 20% were used for valida- 
tion in first phase of central trainings. Later, we splitted the training subset based on 
patient ID attributes. To make the FL setup more appropriate, we ensured that the data 
from an individual patient would not be present on more than one client. For this setup, 
we created 3 clients and for them, data subsets with 2k, 12k, 20k images respectively. 
For each client the proportion of malignant and benign was roughly the same as in the 
whole dataset. In all experiments, we resized the input images to 256 x 256 pixels. 


2.2 Training Details 


We investigated StyleGAN2-ADA performance using an original implementation from 
NVIDIA Research group!. We trained StyleGAN2-ADA models with each of the two 
classes of training set as input, as well as in a conditional setting with and without aug- 
mentations. To select the best model, we considered both the Fréchet Inception Distance 
(FID) [8] and Kernel Inception Distance (KID) [5] metrics, along with training speed, 
similarly as proposed in [4]. The classification task was performed using EfficientNet- 
B2 model [19], pretrained on ImageNet, with Ross Wightman’s implementation”. Dur- 
ing training, we used the Adam method for optimizing the network weights with an 


' https://github.com/N Vlabs/stylegan2-ada-pytorch. 
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adaptive learning rate initialized to 5 x 1074. We trained the models for a maximum 
of 20 epochs and an early stopping with a patience of 3 epochs. We applied standard 
data augmentation techniques, such as random rotation, horizontal and vertical flip, 
during the training phase for all experiments. For the experiments in a FL setup, we 
used Flower framework [3]. In our simulated setup, we created a network with 3 clients 
with different amounts of data and a server, where the weights of the trained model 
were exchanged every 100 iterations. We used the Federated Average (FedAvg) algo- 
rithm [14] as it is an effective and simple method that is commonly used for federated 
aggregation. 


2.3 Evaluation Protocol 


Various dimensions should be considered when evaluating GANs [2]. Firstly, fidelity 
as a measure of reliability, and diversity as a measure of fairness. FID and KID met- 
rics evaluate these two characteristics, but rely on a preexisting classifier trained on 
ImageNet, and are insensitive to the global structure of the data distribution. Also Pre- 
cision (P) and Recall (R) scores measure, respectively, the fraction of synthetic samples 
that look realistic (fidelity) and the fraction of real samples that the model can synthe- 
size (diversity). Perceptual Path Length (PPL) [12] estimates whether and how much 
latent space is entangled or regularized, ultimately being able to capture the coherence 
of images. Another dimension to look at is predictive performance, referring to the fact 
that samples should be as useful as real data when used for the same predictive purpose. 
Here, we built a melanoma classifier using synthetic data for training and real data for 
testing. Since privacy is the most important factor in medical study, we evaluated the 
generalization or authenticity of the generative process [2], which measures the model 
capability to creation of new samples. Additionally, a survey was conducted in which 
experts assessed whether each of the 200 tested images is real or generated artificially 
by cGAN. Finally, we investigated whether it is possible to edit the image by manipulat- 
ing the latent input of the trained GAN. The semantic factorization (SeFa) [18] method, 
as it do not need a large sample of latent vectors and auxiliary classifier, was tested 
to see if we could obtain directions in latent space, where the influence of one feature 
could be controlled while preserving the rest of the image. 


3 Results 


3.1 GANs Trainings 


In the first phase of our experiments, we established the best model in terms of fidelity 
and diversity using well-known metrics such as KID, FID, P, R, and PPL (see Table 1). 
It is worth noting that the GAN responsible only for malignant melanoma generation 
(mal-GAN) had around 6 times less data than for benign cases (ben-GAN). In general, 
the unconditional models have lower PPL scores, showing better regularity of latent 
space due to the fact that they model only the distribution of one class. Additionally, 
the vast majority of malignant melanoma examples in ISIC 2020 and ISIC 2019 show 
a black dermatoscope frame, which leads to the generation of darker images. 
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The conditional setting was used to provide the model with a wider variety of 
images, since there are a lot of characteristics that are common for both classes. 
Achieved higher FID and KID scores confirmed that this is beneficial for the minority 
class (malignant). For this setting, we used ADA mechanism with and without (w/o cot) 
color augmentation, and achieved the best scores for second one. Color augmentation 
leads to leakage of color hue (unnatural red or violet) to the generated examples. Sub- 
jective assessment based on four responses in a qualitative survey, from two dermatol- 
ogists and two deep learning experts, achieved an overall average accuracy of 54% for 
participants (at level 58% for dermatologists and 50% — deep learning experts). There 
was no feature in any image that clearly suggested to the participants that the image is 
either real or synthetic. 


Table 1. Calculated metrics for each of the generative models tested in the centralized setting. 


Scenario KID (%) FID |P ÎR | PPL 
ben-GAN | 0.42 7.99 | 0.77| 0.45 | 60 
mal-GAN | 0.47 15.46 | 0.62 | 0.40 | 51 
cGAN 0.32 7.33 | 0.75 | 0.42 | 193 
CGAN,„/ocoi | 0.24 7.02 |0.75 |0.44 | 101 


In case of simulated hospital scenario in FL setup, we observed faster convergence 
(1.6 times) and improved quality of the generated images mainly for the client with the 
smallest data resources. As the data distributions between different clients only differed 
in size, we put more emphasis on the classification task with centrally trained models. 


3.2 Predictive Performance with Classifier 


After the evaluation with general metrics, we performed a study on predictive perfor- 
mance to measure how useful the synthetic data is for the subsequent task, i.e. malignant 
melanoma diagnosis. As a baseline for the experiments, we first train the classifier on 
training subset of the real images of ISIC dataset, and then tested it on the validation set. 
Secondly, GAN-based augmentation was performed using two types of GANs models 
with two scenarios: training on balanced synthetic dataset with 55k images (syn) and 
testing on real validation subset (the same as in baseline experiment) and training on 
real images adding 22k synthetic melanoma samples (aug) to balance the dataset. The 
introduction of highly underrepresented malignant melanoma cases improves the classi- 
fication accuracy roughly of few pp. in both scenarios, as sumarised in Table 2. Overall 
GAN-based augmentation technique does not provide reliable improvements in case of 
classification using the whole ISIC 2020 and malignant samples from ISIC 2019. 


3.3 Explanations of the Predictions 


To measure the authenticity we projected 12k samples from the real dataset into the 
latent space of the generator. This gave us the latent codes that caused our genera- 
tor to synthesize the most similar output to the input image. To optimize for a latent 
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Table 2. Calculated metrics for each of the classification scenarios with EfficintNet-B2: trained on 
real (real-baseline), only synthetic samples (syn), and augmented balanced dataset with additional 
22k fake malignant lesions images (aug) from conditional (CGAN) and unconditional (GAN) 
models. In all scenarios the models were tested on the same real images validation set. 


Scenario Acc AUC Scenario | Acc AUC Scenario Acc AUC Scenario Acc AUC 


(%) (%) (%) (%) (%) (%) (%) (%) 


real 97.8 98.8 syn-GAN | 94.1 94.2 | syn-cGAN 94.7 96.7 | syn-cGANw/ocot | 92.6 92.7 
baseline aug-GAN | 97.8 98.6 aug-cGAN 97.8 98.6 | aug-cCGAN;,/ocoi | 97.9 98.8 


code for the given input images, we followed [1]. We used a VGG16 model as a fea- 
ture extractor, computed the loss on the difference of the extracted features for both 
the target image and the generated output, and performed backpropagation. Next, we 
extracted the features of both the real and their projected images using the last con- 
volutional layer of our classifier trained on real and synthetic data (aug-CGAN,, jo cot). 
These embeddings were visualized in a 3D space using t-distributed stochastic neigh- 
bor embedding (t-SNE) method [13]. This allows visually exploring the closest near 
neighbors of each real image using cosine distances. Figure | shows examples of real 
images projections in the latent space of the generator (with benign marked on red, 
malignant — blue) and projected embeddings of real and synthetic data. In both cases 
there is visible separation between two clusters created by two examined skin lesion 
classes. However, there are still plenty of the cases in the middle between two clusters 
and mixed with improper class, what is visible in Fig. l(a). Additionally, we spotted 
some clusters inside classes, which are associated with instrumental bias, such as ruler 
and black dermatoscope frame. 


Fig. 1. Real images projections in the latent space of the generator (a). Projected embeddings of 
real and synthetic data coming from the classifier trained on synthetic data (b). 


For a more systematic inspection, we computed the cosine distances between the 
different pairs of real images and their projected samples. The mean distance was equal 
to 0.1444 and the median 0.00283 with only two projections being too close in terms 
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of Q1 = 0.013 (range of le—5) to the real images. Only in these two cases the closest 
neighbor was the projection of the target image, meaning that the generative model 
could have memorized that sample. We treated this as a measure of the authenticity of 
the generated samples. We also spotted that some of the images were very distant from 
their projections (around 2) but still resembled the target image (Fig. 2(a)). 
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Fig. 2. A few examples of the closest (red frame) and the most distant (blue frame) pairs real- 
synthetic in terms of cosine distance (a). Two examples from the malignant class, which were 
found in the center, and in the boundary between two clusters respectively, examined using XRAI 


heatmaps (b). Examples of image editing using the SeFa framework shifted along the 2nd (c), 4th 
(d) and 6th (e) eigenvectors. 


b 


The images in the center and boundary between the two clusters (Fig. 1(b)) were 
studied using local explanations with the XRAI method [10]. For images of malignant 
lesions that belong to the centroid of the embeddings, we found that the mole itself is 
the most important part of the image for the final prediction. In the sample image, the 
network focuses on boundary pixels which represent asymmetry in the mole, one of the 
main clues for detecting malignant melanomas. On the other hand, in edge cases the 


Assessing GAN-Based Generative Modeling on Skin Lesions Images 99 


results are not as evident due to image distortions or poorly centred moles (Fig. 2(b)). 
We selected all the misclassifications and edge cases and generated N neighbors using 
a distance of 0.1 to augment the dataset with more complex examples with the aim of 
making it more robust. First experiments showed an improvement in performance in 
those edge cases. 

Finally, we edited the latent input in an attempt to eliminate the dermoscopic frame 
in malignant melanoma images using the SeFa method [18]. The latent w-vector corre- 
sponding to the image in Fig. 2(c)—2(e) is shifted along the 2nd, 4th and 6th eigenvec- 
tors. The image in the middle in all three rows (3rd column) is the original image. Left 
and right of the original image are positive and negative directions along these eigen- 
vectors. The eigenvectors displayed were chosen from a larger qualitative evaluation of 
100 images along the first 10 largest eigenvectors. Applying SeFa image editing sug- 
gests less entangled features from visual inspection of the images of different directions 
— the black frame was removed leaving the other features (such as shape, size, color) 
almost intact. 

To assess the quality of the edited images, we first generated a large sample size 
of images all containing frames. After acquiring the images we removed the frames by 
shifting the latent vectors along the direction where the presence of the dermoscopic 
frame was minimized. Finally, we trained a classifier on these images for the malignant 
melanoma with a training set of 10k images per class and a test set consisting of real 
images, which result in accuracy equalled 87%. 


4 Discussion 


In our study, we explored the state of the art DL-based techniques to generate, classify, 
and explain computed results for skin lesion diagnosis. Our experiments are based on 
ISIC 2020 and ISIC 2019 datasets, which are one of the largest but very unbalanced 
open access database. 

Samples generated using different types of GANs and settings exhibits slightly dif- 
ferent appearance, as evidenced by the calculated metrics shown in the results (see 
Sect. 3). The PPL measure, which is capable of capturing the consistency of the images, 
is the lowest for generated malignant melanoma samples by unconditional GAN. How- 
ever, this is not connected with the lowest KID and FID scores indicating the dissimi- 
larity between two probability distributions (real and fake) using samples drawn inde- 
pendently from each distribution. Lower PPL score is related to the smallest amount of 
malignant data, and in result more regularized and narrow distribution of latent space. 
The second observation may be connected with the fact, that KID and FID rely on a pre- 
existing classifier (InceptionNet) trained on ImageNet that consists of different images 
rather than skin samples. The results also indicates that the cGAN model is prone to 
generating more realistic looking melanoma (using some features from benign sam- 
ples) than the mal-GAN. No statistical conclusion can be drawn from the small sample 
size in the survey where cGAN generated images were used. However, the results do 
suggest that subjectively, experts are unable to tell an artificial lesion from that of a real 
patient. There was no specific feature that the experts picked up on in the generated data 
as an artifact of the model. Therefore, qualitatively the synthetic data pass for real in 
the eyes of experts. 
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In the case of classification, we have not observed a large improvement of the perfor- 
mance of the classification network based on synthetic data generated by StyleGAN2- 
ADA. Actually, the results achieved in different scenarios do not differ much. This may 
be affected by the large size of the real dataset, but also by the fact that some features 
coupled with, for example, methods of collecting data (existence of black dermoscopic 
frame) may be entangled with a specific class. 

Performed exploration of the latent space showed that there is a clear separation 
between the projections of the real and generated samples. Measured distance between 
the projections of real and the closest synthetic image proved the authenticity of the 
generated samples. Our main interest in the explanation of classification results focused 
on the edge cases, as the dermatologists are paying special attention to those cases 
that lie in the boundary and are not so obvious. We noticed that the network output 
is often biased by acquisition protocols, as well as some patient-related features. The 
main issue seems to be the area covered by the mole on the image. However, this 
topic requires closer examination. Editing images using latent directions could be a 
useful tool in removing unwanted artifacts from images. Nevertheless, dermoscopic 
frames were present mostly in images of malignant melanoma, thus the characteriza- 
tion of class labels was entangled with dermoscopic frames. This entanglement resulted 
in changes in separate features when removing the frame artifact and did not leave 
the malignant melanoma data intact. For future steps, using this technique may show 
promising results in data normalization and generalization in different domains. 

On the other hand, as GAN training requires a large investment in computing and 
data resources, the FL setup may be a solution for smaller institutions with a lack of 
access to sufficient data resources. Achieved results confirmed that generation of skin 
lesions in a distributed setup can lead to similar performance with respect to the quality 
and diversity of generated samples, with a significant faster convergence. However, to 
reach a final verdict on this matter, it is necessary to conduct further research into dif- 
ferent aggregation algorithms, privacy preserving techniques, and even defense mecha- 
nisms against adversarial attacks. 


5 Conclusions 


GAN-based augmentation is an extensively explored technique for medical imaging 
applications, especially in the case of very rare diseases. First of all, it helps in the cre- 
ation of larger and more balanced datasets. Secondly, it creates non-real data, which can 
be more easily shared amongst the medical community. However, the results achieved 
with the addition of synthetic data reported in literature show an improvement in accu- 
racy of only a few percents without clearly explaining the reason. On the other hand, 
GAN-based anonymization suffers from an unset gold standard in measuring its perfor- 
mance. 

To utilize GANs in generating synthetic healthcare data, a number of considerations 
need to be made. First, one should consider the architecture. In our case, we chose 
between central unconditional GANs per class, conditional GAN and FL setup. The 
usefulness of chosen architectures mainly depends on computational resources and time 
- unconditional GAN can be good option with small amount of classes due to long 
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duration of training of single GAN. If a massive, annotated dataset exists, training the 
GAN centrally is preferable but in case of a more realistic scenario of data being siloed 
in an institution, the benefit from FL is noticeable particularly for smaller institutions. 

Second, the created synthetic data should be inspected from multiple different points 
of view. Common features to emphasise are fidelity and diversity, which are important 
to understand how well the synthetic data represents the underlying real data. Impor- 
tantly, as the goal in healthcare is to avoid sharing data, it is also crucial to inspect the 
authenticity of the synthetic examples to make sure they are not simply copying the 
training data. Additionally, the synthetic data should be as useful as the real data for the 
subsequent task (e.g. classification) and not allow inferences based on features that are 
not related to the case, but, for example, to the way the data were collected (e.g., linking 
a black dermatoscope to malignant melanoma). 
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Abstract. The detection of prostate cancer is an important challenge 
for medical personnel. To improve the medical system’s ability to process 
increasing numbers of oncological patients, demand for automation sys- 
tems is growing. At the National Information Processing Institute, such 
systems are undergoing active development. In this work, the authors 
present the results of a pilot study whose goal is to analyze possible 
directions in the development of new, advanced deep learning systems 
using a high quality dataset that is currently in development. 


Keywords: Prostate cancer - Magnetic resonance imaging * Medical 
image segmentation 


1 Introduction 


Prostate cancer is one of the most common neoplasms in men [6]. This indicates 
the importance of developing systems for its efficient detection, treatment, and 
monitoring. The gold standard of cancer diagnosis is the study of histopathology; 
however, due to high variability in the structure of the prostate gland, partic- 
ularly among older patients, the selection of optimal sites for biopsy remains 
challenging. This explains the necessity of medical imaging. The most estab- 
lished imaging modality for prostate cancer detection is multimodal magnetic 
resonance imaging (MRI). However, the interpretation of the multimodal 3D 
images requires time and expertise from radiologists. The increasing average age 
of patients and the rising prevalence of cancers place intense pressure on med- 
ical organizations to supply enough skilled personnel to meet growing demand. 
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One possible solution for alleviating this problem lies in the design of automated 
systems for cancer detection. This, in turn, has led to growing demand for high 
quality datasets and deep learning algorithms. Both solutions are undergoing 
active development at the National Information Processing Institute. 

The selection of architecture is one of the most crucial decisions that influ- 
ences a model’s performance. Until recently, most of the research conducted 
in computer vision was based on convolutional neural networks, during a time 
when natural language processing tasks witnessed an explosion of transformer- 
based architectures. However, according to new research in computer vision, 
transformer-based architectures promise performance that is consistently better 
than that of convolutional neural networks [8]. One of the main characteristics of 
convolutional neural networks is the enforcement of models to include informa- 
tion on the local co-occurrence of image features, which have been proven to be 
a significant inductive bias. Pure transformers do not share this characteristic; 
they learn the spatial correlations between image features via attention mech- 
anisms. This adds a number of degrees of freedom to the models that enable 
them to learn the nonlocal, long-range dependencies in images, at the cost of 
requiring larger datasets to achieve the same performance. Moreover, the newest 
research [19] tackles the high memory requirements of nonmodified transformer 
architectures and the technical problems in training larger models on graphical 
processing units. One solution involves fusing convolutional and transformer- 
based architectures to take advantage of both using a hybrid transformer. This 
can be achieved by inserting a transformer into different layers of a U-shaped 
architecture, composing architectures, and using attention mechanisms on fea- 
tures calculated by convolutional neural networks [8]. The authors of this article 
concentrated on the first type of hybrid architecture, as they have already proven 
to be efficient in multimodal MRI settings [7] and specifically in prostate cancer 
detection [17]. At the time of writing, no consensus exists on the best available 
transformer-based architecture for prostate cancer detection and segmentation. 
This points to the necessity of further research and experimentation, the prelim- 
inary results of which are presented below. 


2 Material and Methods 


The data used to train and validate the model was accessed from Artificial Intel- 
ligence and Radiologists at Prostate. Cancer Detection in MRI: The PI-CAI 
Challenge [1]. The data encompasses 1,500 partially labelled cases of prostate 
parametric MRI (bpMRI). The labels, when present, indicate the locations of 
prostate cancer. The algorithm described below utilized T2-weighted imaging 
(T2W), axial-computed high value (> 1400s /mm2) diffusion-weighted imaging 
(DWI), and axial-apparent diffusion coefficient maps (ADCs). The labels were 
annotated manually by human experts, and at least two changes were consid- 
ered significant for the International Society of Urological Pathology (ISUP). 
The main library used in the work was Monai [4], which is a PyTorch-based 
[14] framework for deep learning in the medical imaging domain. To improve the 
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code structure and training time, the code was refactored for use with Pytorch 
Lightning [5]. Image preprocessing was completed using the proposed algorithm 
from the PI-CAI Challenge [1], based on the nnUnet [9] architecture. All prepro- 
cessing steps were implemented as Monai transforms. Image augmentations were 
performed using the batchgenerators library [10]. To improve the reproducibility 
of the algorithm, training and inference were conducted using Docker containers 
[13]. All experiments were performed in the Google Cloud cluster using a server 
with NVIDIA A100 40 GB RAM GPU. 


2.1 Preprocessing 


The MRI data was normalized in each channel using z-score normalization. The 
image shape was set to (256, 256,32) for it to be a multiple of the sixteen in 
each axis, as the chosen architecture required. The spacing of the dataset was 
highly inhomogeneous; for this reason, all images were resampled to achieve 
(0.5,0.5,3.0) voxel size. Image augmentations were performed using the batch- 
generators library [10] and encompassed Gaussian noise, elastic deformations, 
Gaussian blur, brightness modifications, contrast augmentations, simulations of 
low resolution, and mirroring. All of the labels were converted to binary masks 
and included in augmentations that led to spatial deformations of the original 
images. 


2.2 Deep Learning Architecture 


We selected Swin UNETR [7] as the architecture because it demonstrates char- 
acteristics that are crucial for the further development and finetuning of the 
algorithm on the new dataset in development. The neural network architec- 
ture is based on transformers. This has multiple advantages over traditional, 
convolution-based architectures. Primarily, it increases the receptive field, which 
enables the learning of long-range image dependencies. It partially avoids trans- 
lation invariance of convolutions, which, in the context of medical imaging, 
can lead to the loss of relevant location-based information. Transformer-based 
architectures also have generally higher expressive power due to their less pro- 
nounced inductive bias. However, such architectures also cause difficulties due to 
their high memory footprint and relatively poor performance on small datasets 
(because of reduced inductive bias). The architecture is summarised in Fig. 1. 

For the current work and the dataset in development, the Swin UNETR 
architecture has additional crucial characteristics that are well suited to mod- 
elling multimodal images. As a transformer architecture, it is possible to extend 
Swin UNETR to incorporate clinical data in tabular form. 


2.3 Optimization 


The model’s optimization was implemented using the PyTorch AdamW [12] 
optimizer. Cosine annealing with warm restarts [11] was used for the Learning 
Rate Scheduler, and the initial learning rate was established by the Learning 
Rate Finder [18] implemented in PyTorch Lightning. 
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Fig. 1. A simplified schematic diagram of Swin UNETR on the basis of Fig.1 from 
Hatamizadeh et al. The input comprises four channels with output of segmentation of 
whole gland, ADC, HBV, and T2W values [7] 


2.4 Hyperparameter Selection 


Hyperparameter tuning was achieved using a genetic algorithm implemented in 
the Optuna [2] library. Hyperparameter tuning was used in the selection of the 
optimizer, architecture, and optimizer-related decisions like the Learning Rate 
Scheduler. 


2.5 Postprocessing 


The training was conducted as a five-fold cross-validation using splits provided 
by the contest organizers, and the outputs of each fold were combined by a mean 
ensemble algorithm. The model’s output was passed through a sigmoid activa- 
tion function before lesion candidates were extracted using the report guided 
annotation library [3]. The proposed lesions were analyzed further by assess- 
ment of simple radiomic characteristics that are important for the task at hand; 
this can help increase the model’s precision by filtering out some false positive 
results. Proposed lesions were assessed for their: 


— size, where too big and small lesions were filtered out; 

— elongation and roundness, where highly elongated changes were filtered out, 
as they typically represented the obturator internus muscle or some of the 
large vessels in the pelvis; 

— the hypointensity of the ADC map and the hyperintensity of a high b-value 
DW image, defined as the difference of the mean value of complementary 
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Fig.2. A transverse T2W image of the prostate. In green, the gold standard label 
indicates prostate cancer; in blue, the changes detected by the neural network (and on 
the left, after being filtered by post-processing). 


modalities concerning a lesion's neighborhood. As the presence of hyperin- 
tense lesions on a high b-value DW image with related hypointense signal 
intensity on the ADC map is typical for prostate cancer, lesions that failed 
to meet this criterion were filtered out. 


Figure2 presents an example of the algorithm output, before and after the 
changes are filtered out by their radiomic features. 


Table 1. A summary of the simple shape statistics of segmented instances 


elongation | physical size | roundness 
TP | Min 1.0 18.3 0.4 
FP 0.0 0.4 0.1 
TP | Max 3.2 40726.5 1.2 
FP 7.8 75598.7 1.7 
TP | Median | 1.4 1512.1 0.8 
FP 1.9 1402.5 0.9 
TP | STDEV | 0.4 5227.1 0.1 
FP 0.8 3789.3 0.2 
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Fig. 3. A histogram of the distribution of roundness values: in blue, false positives; in 
red, true positives. The number of samples per histogram bin is scaled logarithmically. 


3 Results and Discussion 


Validation of the algorithm was performed using the Picai evaluation library [15] 
on the validation dataset provided by the contest organizers. Preliminary results 
for the model give a Ranking Score of 0.531, Area Under the Receiver Operat- 
ing Characteristic curve of 0.686, and Average Precision of 0.376. An analysis of 
simple radiomic characteristics was performed and is summarized in Table 1. For 
each measured quantity—elongation, physical size, and roundness—the incor- 
rect segmented instances presented approximately two times higher standard 
deviations, which indicates far higher variability. This also suggests a far wider 
distribution of the aforementioned quantities and the possibility of identifying 
suitable thresholds that define some of the segmented instances as false posi- 
tives with high probability. As an example, in Fig.3, one can observe that in 
the dataset, all segmented instances with roundness lower than 0.4 were false 
positives. A similar analysis can be performed for all other quantities. However, 
final conclusions regarding increases in model specificity using radiomic-based 
postprocessing require further study. 

The results suggest that the model performs comparably to the state-of- 
the-art non-transformer-based baseline architectures provided by the contest 
organizers. However, a significant number of the top-ranking results that are 
presented on the contest leaderboard are based on transformer architectures. 
This demonstrates their impressive ability to learn the presented task and the 
presence of further opportunities for optimization. 
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4 Conclusions 


This study indicates the usefulness of new transformer-based architectures in 
multimodal three-dimensional medical imaging. An additional feature considered 
necessary for analyzing the dataset is the proven ability of transformer-based 
architectures to incorporate data from different sources [16]. This provides a 
strong base for incorporating clinical data directly into the neural network archi- 
tecture. Radiomic analysis performed in the postprocessing step proved helpful in 
the study by increasing the model’s specificity; work on more advanced radiomic 
analysis is fully justified. The use of model PyTorch-based libraries enabled effi- 
cient training, which supplies further proof of its efficiency. Such tools can serve 
as the basis for additional work on the algorithm’s development. 
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Abstract. Stock management is very important for the companies to supply nec- 
essary demand for the products they sell, to pricing the products and to the aspect 
of storage cost. In stock management, the products to be sold to the customer are 
procured by ordering from the vendors. The orders given to the vendors are deter- 
mined by estimating the sales quantities of the products. When estimating sales, if 
we order in large quantities, the storage and expiration dates of the products may 
exceed, or if we order less than demand, the customer cannot find the product in 
the store. With the Covid-19 pandemic entering our lives, there have been some 
changes in our habits. One of these changes is the change in shopping habits of 
people due to the isolation period. By managing this change in terms of stock 
management on the store side, it ensures that people can reach the products they 
demand in these difficult times and that companies do not create extra costs by 
making more stock than necessary. We made a sales forecast on 5-It sunflower oil 
which is a basic food product using the data of a grocery chain with machine learn- 
ing methods and developed models to use these forecasts in stock management. 
Our data is multivariate and contains both quantitative and qualitative features. In 
our study, we used the supervised learning method and the XGBoost, LGBMRe- 
gressor and Ridge models used in many machine learning projects. As a result of 
our studies, an improvement of approximately 25% has emerged with the features 
we added specifically for the pandemic. 


Keywords: Machine Learning - Demand Forecasting - Decision Support 
System - Stock management 


1 Introduction 


Markets are workplaces where customers can meet their needs. It is essential that the 
market supplies the needs of the customers, namely the demand. If the market does not 
present the customer’s needs to the customer, this may lead to customer dissatisfaction 
and loss of customers for the market [1]. The reputation of the stores in the eyes of the 
customers is also important, their customers want to buy the product they want at an 
affordable price. Inventory management plays an important role in customers’ access to 
products and in affordable prices compared to other competing markets [2]. Effective 
stock management benefits the market and the customer [3]. 


© The Author(s) 2023 
C. Biele et al. (Eds.): MIDI 2022, LNNS 710, pp. 111-123, 2023. 
https://doi.org/10.1007/978-3-031-37649-8_12 


112 E. Yildirim et al. 


Inventory management is to ensure that the products offered by the markets to their 
customers are supplied from the suppliers and that the customers can reach them on 
the shelves of the market at more affordable prices than the competitors of the market. 
There are two criteria here; It should be ensured that the customer has continuous and 
affordable access to the product [4]. When the criteria are met, customer satisfaction 
will increase the profitability rate in the market. 

In order to be able to manage the stock, it is necessary to predict how much of each 
product will be sold [5]. If the sale of the product is underestimated and the customer 
cannot reach the product, it will cause loss of customer and reputation in terms of the 
market. If we overestimate the sales of the product, we have to store the product, which 
increases our warehouse cost and therefore the cost of the product. It makes it difficult 
to offer affordable products to our customers. In another case, when we have excess 
demanded product and the expiry date of the product has passed, we can no longer sell 
the products and the cost of the product increases [6]. However, it causes additional costs 
because the products in the warehouse or on the shelf have to be removed for the market. 

In extraordinary times such as stock management pandemics, it becomes even more 
important that the demand, which consists of the needs of the people, can be supplied 
by the market [7]. This situation, which is out of normal, causes abnormality in sales. 
Especially during the pandemic period, together with the problems experienced in logis- 
tics, stock management and thus sales forecasting are of vital importance in presenting 
the required products to customers [8]. 

Sales forecasting provides continuation of sales before stocks run out and pro- 
vides real-time forecasts suitable for all situations [9]. Statistics, mathematical models, 
machine learning and deep learning etc. on stock management. Many techniques are 
used [10]. Instead of a rule-based model [8], we decided to examine this issue using a 
machine learning technique that could extract the change in sales from the data. First of 
all, we made general analyzes on our data and tried to understand our data. In the light 
of the analyzes we made, we tried to find features that would positively affect the result 
while predicting sales by performing feature engineering on our data. We developed a 
model using our data and newly discovered features and the XGBoost machine learning 
model [11], which is frequently used in making predictions. In our study, with the devel- 
opment of the pandemic feature, an improvement of approximately 25% has emerged 
in sales forecasts. 


2 Dataset 


Our data set includes sales records of 5-1t sunflower oil product in all branches between 
01-05-2019 and 01-05-2021 of a chain market consisting of 10 branches located in var- 
ious parts of Turkey. Data set contains 1613 daily records. The selected time period also 
includes the COVID-19 pandemic period so that the pandemic effect can be examined. 
With the COVID-19 pandemic, there have been unusual curfews and serious difficulties 
related to logistics [12]. Limits have been imposed on the working hours of the markets 
and the number of customers inside. At the same time, restrictions were introduced for a 
certain segment of people to go out to the street, depending on their age. Such restrictions 
on both markets and customers have led to significant changes in sales [13]. The features 
included in our data are given in Table 1. 
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Table 1. Data information. 


Property Explanation 

Date Date of the sale 

Sales Quantity Amount of sales made during the day 
Promotion Promotional information on the product 


Available Per Day | Existing product found at the beginning of the day before the sale starts 


Sale price Selling price of the product 


Cost The cost of the product to the market 


Using the date information here, the period without a pandemic before March 2020, 
when the pandemic effect was seen, and the period after it were described as the pandemic 
period. 


3 Methodology 


3.1 Problem Identification 


Figure | shows our methodology. The methods that are actively used to forecast the 
sales of the product in non-pandemic times have changed due to changes and limitations 
in shopping habits during the pandemic. When we use the machine learning algorithm, 
which is trained with normal data, under pandemic conditions, it makes bad predictions 
compared to the previous period estimates due to the change in the data. Due to this 
change, the previous forecasting model will have a high margin of error during the 
pandemic period. In this study, it is aimed to prevent this. 


3.2 Data Preparation 


After our data was obtained from the database. It was pre-processed as follows in order 
to eliminate the errors and deficiencies found in our data. Some of these errors are Iden- 
tifying the missing areas in our data and removing them from our data will prevent our 
model from making mistakes and making biased learning during the training. Features 
in the dataset are typecasted in order to extract additional features. For example: date 
columns transformed string to datetime type. 


3.3 Exploratory Data Analysis 


In this study, only human exposure estimation was made without distinguishing activity. 
Exploratory data analysis provides the opportunity to summarize our data, discover the 
features of our data, understand the relationship between features, and detect abnormal 
data by analyzing our data with statistical methods and data visualization methods. These 
analyzes can also be used during feature extraction processes. 
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Fig. 1. Flowchart for our processes in sales forecasting 


General information can be obtained by examining our data statistically first. When 
we examine Table 2, where the data are analyzed statistically, we can see that the sales 
figures of the pandemic period have decreased in general. We analyzed the sales data 
of our product statistically: the number of records, average sales, standard deviation, 
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minimum value, maximum value and data distribution of 25%, 50%, 75%. We obtained 
these values in Table 2 using the print(df.describe()) function of the pandas library. As 
can be seen in the table, there has been a decrease of approximately 30% in the average 
of sales. Changes in other areas are also at this level, and sales have generally decreased 
as expected during the pandemic period (Fig. 2). 


Table 2. Data information. 


Criteria Pandemic Era Regular Era 
Data Number 1186 427 
Average sale 19 31 
Minimum 0 0 
25% 9 13 
50% 14 20 
15% 22 35 
Maximum 368 730 
140 STATUS 
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">. PANDEMIC 
100 
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Z 80 
O 60 
40 
20 | 
00. 20 400 600 
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Fig. 2. Distribution of data 


In order to see the changes in our sales data with and without pandemics on the basis 
of days of the week, we obtained Figs. 3, 4 and 5 using the seaborn library and column 
(bar plot) and line (line plot) graphics. Percentage changes are expressed in the bar chart. 

In the graphs in Fig. 4, a column chart of the sales in the pandemic period (shown 
in red) and in the non-pandemic period (shown in green) is shown by weekdays. In this 
chart, the change in sales is observed due to the expected weekend closure. Percentage 
changes are given on the column. When we interpret the graph, we can see that the sales 
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in the normal period are high on the weekend. In the pandemic period, we can see that 
sales are less due to the weekend closure. 


Average Sales Amount by Week Days in the Pandemic and Non-Pandemic Periods 
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Fig. 3. Average sales by weekdays during the pandemic and non-pandemic period. 


In order to understand the changes in sales on weekdays and weekends, we showed 
our data in the form of time with and without pandemics, as in Fig. 3, with a column 
chart as the average sales on weekdays and weekends. When we examine these graphics, 
we need to update our model that we use in normal times, because the sales character 
has changed during the pandemic period. In order to predict sales during the pandemic 
period, a model suitable for the development of our model should be put forward by 
adding new features. 


Average Weekday and Weekend Sales During the Pandemic and Non-Pandemic Periods 
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Fig. 4. Average weekday and weekend sales in the pandemic and non-pandemic period. 


As seen in the line graph in Fig. 5, sales follow a decreasing trend on weekends. 
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Average Sales Quantity by Week Day in the Pandemic and Non-Pandemic Periods 
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Fig. 5. Average sales per weekday in the pandemic and non-pandemic period 


3.4 Feature Extraction 


In the light of the graphics in the analysis section, the pattern and characteristics of 
our data have changed due to the change in sales habits during the pandemic period. 
Because of this change, when our machine learning models are trained with the data 
before the pandemic, the consistency in the predictions is lost because the pandemic 
data is different. For this, we can add new features, taking into account the information 
in the analysis section, so that our model can better learn the changing data and non- 
pandemic data. In this way, we can better predict the sales during the pandemic period 
and make better stock management by placing the orders accordingly. We use pandas 
library s ready-made functions on time data types while extracting properties. 

These features were extracted from the time feature of our data in order to be able to 
deal with the year, month, week of the year, which day of the year, which day of the week, 
and when and in which situation the sale was made. Thanks to these features, we expect 
our model to establish a connection between sales data and time and to better learn the 
sales values made at the same time. We evaluate these temporal features categorically, 
and we want our model to evaluate it that way. Since there will be a connection between 
time-based sales, records with the same time feature will have similar values and this 
approach will enable our model to learn better. 

We extract a true-false property from our time feature, whether it is weekdays or 
weekends. Weekday and weekend sales are different in the retail industry. It is said by 
retail experts that people shop more during the holidays. In order to express this situation 
in our data, we add this feature to our data. 

The character of the sales during the pandemic period, which we observe in the 
graphics, is changing. We add a categorical feature to our data that takes values as 
pandemic and non-pandemic when it is in the pandemic period. 
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3.5 Dataset Separation 


We divide our data into two as data to be used in training the models and then as data to 
be used in testing in order to measure the predicted performance of the trained model. 
When dividing the data, considering that our data is a time series, we determine the 
training and test parts by shifting. In order to see the change over time in the time series 
data, the part immediately after the data taken for training is taken for testing. Figure 5 
illustrates this approach. Each shift is evaluated as a step, and the model is trained with 
the training data at each step, and evaluation metrics are obtained with the test data. 
Then the average of the scores obtained in all steps is taken. 

Time series cross validation method is used before a certain time for training and 
after for testing, so we can objectively train time series data and measure the performance 
of our model. We did it using the time series cv algorithm of the sklearn library. 

tscv = TimeSeriesSplit(n_splits = 3, test_size = 2). 

Example: Size of data set = 1613. 


1. Fold= 1 
a. Train size = 1607 
b. Test size = 2 

2. Fold = 2 
a. Train size = 1609 
b. Test size = 2 

3. Fold = 3 
a. Train size = 1611 
b. Test size = 2 


Data 


Train | Test 
Train | Test | 
Train | Test | 
Train | Test 


Fig. 6. Time series cross validation method. 
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3.6 Regression and Machine Learning Model 


Regression is a statistical measurement that tries to determine the strength of the rela- 
tionship between a dependent variable and other independent variables, and makes pre- 
dictions according to this power. It is used as a predictive modeling method in machine 
learning where an algorithm is used to predict the dependent variable. 

Solving regression problems is one of the most common applications for machine 
learning models, especially in supervised machine learning. Algorithms are trained to 
understand the relationship between independent variables and an outcome or dependent 
variable. The model can then be used to predict the outcome of new and invisible input 
data or to fill a gap in the missing data. 

Machine learning is a sub-branch of artificial intelligence algorithms. It has a struc- 
ture that can learn on data, and allows us to make predictions after this learning. This 
model takes an input and an output data as training data. The algorithm establishes a 
function by learning between the input data and the output data and learns the statistical 
pattern in our data. Estimates are made with the learned model. 

Here, we decided to use the XGBoost and LGBMRegressor model with the tree- 
based gradient boosting [14] feature, which is the most popular of the machine learning 
algorithms, and the linear-based Ridge model. This choice was made taking into account 
both the overall success of the models in machine learning and the number of records of 
our data. Due to the scarcity of data we have, deep learning methods were not preferred. 

XGBoost and LGBMRegressor algorithms are decision trees-based machine learning 
algorithms that use gradient boosting. XGBoost has brought some improvements over 
plain GBM, such as the use of regularization, pruning, and parallelization to prevent 
over-learning. It works faster than other algorithms with its parallel operation. It is used 
in many projects and competitions. Since the XGBoost library is an open source project, 
it is developed and supported by many users. 

XGBoost, short for Extreme Gradient Boosting, is a scalable distributed gradient 
assisted decision tree (GBDT) machine learning library. It provides parallel tree rein- 
forcement and is the leading machine learning library for regression, classification and 
sorting problems. XGBoost, a supervised machine learning method, uses algorithms to 
train a model to find patterns in a dataset containing tags and features, and then uses the 
trained model to predict tags in a new dataset’s features. 

LGBMRegressor has the following differences from XGBoost. Instead, it grows trees 
in the form of leaves. He chooses the leaf he believes will provide the greatest reduc- 
tion in loss. Also, LightGBM does not use the commonly used rank-based decision tree 
learning algorithm, which looks for the best split point in the sorted feature values, as 
XGBoost or other applications do. Instead, LightGBM implements a highly optimized 
histogram-based decision tree learning algorithm, which provides huge advantages in 
both efficiency and memory consumption. The LightGBM algorithm uses two new tech- 
niques called Gradient-Based Unilateral Sampling (GOSS) and Special Feature Packing 
(EFB), which allow the algorithm to run faster while providing a high level of accuracy. 

Ridge regression is a method of estimating the coefficients of multiple regression 
models in scenarios where the independent variables are highly correlated. It has been 
used in many fields, including econometrics, chemistry, and engineering. 
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3.7 Performance Evaluation 


The metrics allow us to evaluate our model’s predictive performance after training. It 
allows us to measure how successful a model trained with training data is on the test 
data, or how wrong predictions it makes. We will use the MAPE [15] and RMSE [16] 
metrics to measure our model. These metrics are frequently used in regression tasks. 

The problem we are trying to solve is a regression problem. Among the most preferred 
metrics in regression problems are MAPE and RMSE. The reason why we use two 
different metrics at the same time here is to be able to look at our mistakes from two 
different angles and to be more objective. 

MAPE, Average absolute percent error (MAPE), also known as mean absolute per- 
cent deviation (MAPD), is a measure of the error in estimation of a forecasting method 
in statistics. The MAPE scale is set out in Eq. 1. 


i 
MAPE = — i 


Here, At, Tt, and n represent the actual value, the predicted value, and the number of 
cases tested in the total data set, respectively. The MAPE metric is our error expressed 
as a percentage. Calculation of the error as a percentage will reflect the error more 
objectively, if the magnitude of the values change as a scalar, since this is calculated as 
a percentage to the error. 

RMSE; The mean square deviation (RMSD) or mean square error (RMSE) is a 
commonly used measure of the difference between values predicted by a model or 
estimator (sample or population values) and observed values. The RMSE scale is shown 
in Eq. 2. 


Ytrue — Ypred x 100 (1) 


Jtrue 


1 n 
RMSE = ae (true — Ypred) (2) 


The abbreviations used in the RMSE metric are also used in the same sense as 
MAPE. The performance of the extracted feature and the operated model with these two 
performance scales are given in Table 3 and Fig. 6. 


4 Findings and Interpretation 


By training our data in a comparative way, we will make evaluations by observing the 
effect of the features we add on the result. Two datasets were created, the first contains 
features related to the pandemic and the other does not. We trained these datasets with 
XGBRegressor, LGBMRegressor and Ridge machine learning models, and evaluated 
our models with the performance metrics that occurred in the estimation they made on 
the test data. 

The results we obtained with the test data after the training are given in Table 3 and 
Fig. 6. Considering these results, our model makes 25% improvement in the one step 
ahead sales prediction (Table 4). 
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Table 3. Data information. 


Era Metrics 

MAPE RMSE 
Using Pandemic specific model 48 6.29 
Regular model 36 4.61 


Table 4. Data information. 


Metrics | XGBRegressor LGBMRegressor Ridge 

Pandemic | No Pandemic | Pandemic No Pandemic | Pandemic | No Pandemic 
RMSE 3.11 6.09 6.52 9.94 6.69 7.01 
MAPE | 24.82 64.11 38.35 58.35 59.84 63.61 


Figure 7 in below shows the estimation of our model on the test data graphically. As 
seen in this way, our model generally catches the sales trend, but makes an one step ahead 
prediction that sometimes less or more sales prediction will be made in sales (Fig. 8). 


— yvalid 
— y.valid_pred_xgb 
25 — y valid_pred lgbm 
— y.velid_pred_ridge 
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Fig. 7. Model predictions with Pandemic feature (blue model prediction, red actual test data) 
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Fig. 8. Model predictions without pandemic feature 
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5 Conclusion 


If the market chains cannot manage the stock well, they may face the risk of loss of 
reputation in the public, storage due to excess stock, or deterioration of the product due 
to insufficient supply to the incoming demand. Any market chain sells around 10000 
products and the stock management of this large-volume product portfolio requires 
serious resources. Here, XGBoost, LGBMRegressor and Ridge machine learning models 
were preferred as a method where we can efficiently manage stocks through algorithms 
created with today’s developing techniques and data obtained, update our data despite 
changing conditions, and provide rapid training because there are so many product types. 

As a result of our work, we have developed a new feature that can be used for the 
pandemic period, and by adding this feature to other already existing features, we have 
improved the consistency of our predictions. 

In the future, when feature extraction studies on our data reach the desired point and 
the number of data increases, we are considering using deep learning methods, feature 
extraction and models. The amount of data is of great importance for deep learning 
methods. 

With this developed model, the sales forecast of the products in the portfolio of a 
market can be made and an adequate stock management can be realized in this way. In 
this way, the stock management of the entire market will be fully automated with the 
recommended machine learning model. 

Production and consumption, which is the basis of the economy, is directly related 
to stock management. This study will be an example to the literature for information 
and guidance for each company working on stock management. 
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Abstract. Computers rely on different methods and approaches to 
assess human affective information. Nevertheless, theoretically and 
methodologically, emotion is a challenging topic to address in Human- 
Computer Interaction. Exploring methods for assessing physiological 
responses to emotional experience and for aiding the emotion recogni- 
tion features of Intelligent Virtual Agents (IVAs), this study developed 
an interface prototype for emotion elicitation and simultaneous acquisi- 
tion of the user’s physiological and self-reported emotional data. Supple- 
mentary, the study ventures to combine such data through event-related 
signal analysis. 


Keywords: Emotion Recognition - Multimodal Signal Acquisition - 
Self-report methods 


1 Introduction 


Humans benefit from emotional interchange as a source of information to adapt 
and react to external stimuli and navigate their reality. Computers, on the other 
hand, rely on classification methods to do so. It uses models to calculate and 
differentiate affective information from other human inputs because of the emo- 
tional expressions that emerge through human body responses, language, and 
behavior changes. The present study is structured to investigate methods for 
identifying and interpreting variations of physiological responses related to emo- 
tional states as a way of improving emotion recognition in interactive systems. 
The study frame observes emotional responses during interactions with Intelli- 
gent Virtual Agents (later IVAs) in a simulated context. 

In recent years, increasing investments have introduced IVAs in customer 
service, education, health care, entertainment, immigration settings, and social 
media. IVAs encapsulate the embodiment of different interactive channels and 
establish meaningful communication with humans by enacting emotions, empa- 
thy, and social behavior [6]. The social and emotional capabilities displayed by 
IVAs motivate the users to establish empathy and bonding [5]. Moreover, IVAs’ 
real-time perception, cognition, and emotional awareness bring novel solutions 
for human-machine cooperative tasks [21] within social domains. 
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1.1 Research Goal and Motivation 


This study investigates methods for assessing psychophysiological responses to 
emotional experiences evoked in a simulated IVA interview. Closely, the project 
reflects on the implications of IVA's emotion recognition for applications in 
real-life scenarios, like easing the critical situation of migration management 
in Europe [7,8]. Ultimately, the study encourages an adequate use of emotional 
information and thoughtful development of automated mediators to aid in con- 
texts involving high-stakes decisions that shape people's life chances [8,23]. Iden- 
tifying the asylum seekers’ narratives aspect which proves their situation of fear 
is essential to grating asylum-seeker protection, as the Refugee Status Determi- 
nation(RSD) procedure assesses applicant’s “well-founded fear of being perse- 
cuted” due to “happenings of major importance” and, for that unable to return 
to their home countries ([9] - Paragraphs 34, 36, 32-110). Therefore, to auto- 
mate stages of the RSD assessment, it is critical to explore the design of emotion 
recognition methodologies to assist automated migration management processes. 


1.2 Hypotheses and Research Question 
Two hypotheses are formulated to guide the development of the research: 


— H1: People react emotionally to a simulated interview interaction with a vir- 
tual agent, even in as-if contexts where they are assigned specific roles (“Imag- 
ine that you are -”) 

— H2: Individual human subjects’ responses to the perceived affective infor- 
mation in simulated settings can be identified and related to a set of emo- 
tional states based on a combination of quantitative (psycho-physiological) 
and qualitative (subjective reports) measures of their behavior. 


Correspondingly, three research questions are investigated: 


— RQ1: How do people emotionally appraise the context of the interview? 

— RQ2: Can the individual subjective self-reported emotional experience (quali- 
tative data) be associated with features recognized from the same individual’s 
physiological behavior (quantitative data)? 

— RQ3: Within the context, is it possible to identify certain emotional reactions 
by looking at the patterns of physiological data? 


2 Theoretical Background 


Psychology researchers have waged an endless debate since William James (1884) 
interrogated what emotion is. Nevertheless, there is little consensus around the 
definitions [11], albeit the diverse answers presented. Damasio’s studies described 
that humans use the body as a theater for emotions, supporting that the mind 
is embodied and the experience of emotions is a manifestation of the drives and 
instincts that help the organism regulate itself and respond to changes in the 
environment [24]. As Fontaine and colleagues [4] points out, some elements of 
the emotional experience are uncontroversial: 
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— Certain events tend to elicit specific affective responses in organisms. 

— Physiological changes, behavior expressions, behavioral intention, and direc- 
tion shifts characterize these organism responses. 

— The response patterns produce a conscious feeling of a certain quality in the 
person’s experience over a certain period. 

— Emotional episodes and the associated experiences are labeled by the experi- 
encing person and by the observers with a specific word or expression. 


The appraisal theory of emotion explains that affective states are elicited by 
people’s subjective evaluation or appraisal of the events’ significance for their 
well-being and goal achievement. Also, these theories acknowledge that emotions 
arise from comparing individual needs and external environmental demands. 
Brave and Nass’s investigation remarks that the knowledge of appraisal theories 
serves HCI researchers in modeling and predicting users’ emotional states in 
real-time [10,25]. Likewise, the appraisal theory of emotions can be mapped to 
Artificial Intelligence concepts, like belief-desire-intention models of agency [21]. 

By examining its users’ psycho-physiological cues, computers can sense 
implicit communication by assessing affective and cognitive states. Emotion 
recognition is based on automated classifiers trained to identify general emo- 
tional states from patterns found in the users’ biosignals, whether explicit (i.e., 
measurements of facial expression and gestures) or implicit (i.e., electrocardio- 
graphy or electrodermal activity measurements). Furthermore, it expands the 
interface's limited input modalities (i.e., mouse, keyboard, and camera), mov- 
ing the limits of interaction beyond stated and visible emotion parameters [10]. 
Stemmler’s studies [12] demonstrated relationships between one emotion and 
specific physiological changes, showing that, for instance, fear can be charac- 
terized by strong cardiovascular and electrodermal activity. Although there is 
evidence that it is possible to identify certain emotions through measures of 
bodily changes, the results are confined for multiple reasons. Issues of validity 
and reliability are often pointed out in research [1,3, 13]. 

In general, emotions do not display a universal signature of physiological 
activity for different people, as emotions arise from a unique and personal expe- 
rience for each individual, and the relationship between appraisal of a situation 
and emotional response is, to a great extent, context-bound [10]. Nonetheless, 
recent studies suggest that to foster the development of the classifiers, combined 
measurements of psycho-physiological activity during emotional events are suit- 
able strategies for sorting out overlapping signal problems [11,22]. Moreover, 
self-assessment results can be correlated to the physiological metrics to classify 
emotions measured from various parameters [14]. 


3 Methodology 


This study adopts an experimental approach to examine the influence of ques- 
tions raised by an IVA on participants’ emotions in the context of an automated 
initial interview in the RSD assessment. The researchers developed a within- 
subject study consisting of one session in which participants are presented to 
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one hundred questions, as shown in Fig. 1 below, displayed audio-visually by a 
VA. Precisely, in each session, participants interact with an interview simulation 
where they encounter a looping video of a female avatar that articulates via 
voice-over questions regarding practical and personal matters (stimuli). Essen- 
tially, each question represents one unique condition. The independent variable 
in this study is the answer provided to the emotional self-assessment, and the 
dependent variable is the emotional bodily experience recorded from partici- 
pants. 


1 Session 


— 100 
' conditions 


Participant 


Fig. 1. In a single session, the study exposes participants to the different conditions 


The display of the female VA on the screen remains constant throughout the 
session. Although the virtual agent has no adaptive reaction, its display on the 
simulation is meant to foster the enactment of an IVA presence by emulating 
the gaze of the computer [16]. 


3.1 Experiment Design 


The researchers created a prototype interactive interview simulation that embeds 
simultaneous psycho-physiological data collection. At the experimental level, 
when studying participants’ emotional perception via as-if situations, the nar- 
rative supports psychological immersion into the interaction context [26] and 
furthermore, the fictitious scenario help to situate the participants into the spe- 
cific context of the research [17]. Moreover, inspired by the research on social 
appraisal [2,15], it is assumed that even though participants might answer ques- 
tions based on their own experience, the simulated interview context still allow 
them to identify with the asylum seeker’s emotional situation. 

In this study, context is presented as a narrative prior to the experiment 
and later enhanced by the simulation that situates participants into the RSD 
procedure and invites them to take the role of an asylum seeker. To provide 
stimuli, the study uses references from the list of Affective Norms for English 
Words (ANEW) [19] for phrasing the emotionally evocative questions. The ques- 
tionnaire comprises a hundred closed questions, sixty-five percent of which are 
assumed to be emotionally evocative. The simulation ultimately intends to ben- 
efit from the physiological responses generated over the enunciation of the sen- 
sitive questions and does not have follow-up inquiries. 

Accordingly, as shown in Fig. 2, a question is first enunciated by the virtual 
officer and presented as text on the screen. Once that is completed, partici- 
pants are taken to the self-report screen and asked to select from the wheel the 
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label and intensity of the emotion they experienced. After completing the self- 
assessment, participants are presented with the answer options. Finally, after 
answering that question, participants are presented with a new one. Random- 
ization of the questions was not included in the design of the simulation. 


1- Question Presentation 2 - Emotional Self-report 3 - Question Answer 


4 - Repeat 


Fig. 2. Screenshots illustrating the loop of tasks through which participants labeled 
the emotional state experienced for each question presented by the virtual officer 


As the experiment design establishes no fixed time for completing the simu- 
lation, every time a new question is presented to participants, a mark is created 
in the database to inform the start of the question. The approach is justified for 
not inducing pressure on participants during the simulation play. 


3.2 Procedure 


It is worth remarking that the experiment was piloted twice to test out a) the 
simulation setup and questions structure and b) data collection methods. After 
fine-tuning the latest adjustments, the experiments were carried out in labo- 
ratory settings under uniform conditions. Figure3 illustrates the experimental 
procedure designed for the study. 

Experiments were conducted during the COVID-19 pandemic; thus, the nec- 
essary safety measures stood as part of the procedure. At first, participants 
answered a BIS/BAS questionnaire [20] to analyze the symmetry of personality 
in terms of the motivational systems underlying behavior and affect responses. 
When conforming, participants were considered apt for the experiment. Later, 
information regarding the experiment, the purpose of the study, and the technical 
apparatus was provided. Participants were exposed to the contextual narrative 
upon signing the consent form. Following, the sensors were connected, as it is 
shown in the Data Collection section below. Before engaging in the simulation, 
participants were invited to perform a simple breathing exercise. It is notewor- 
thy that, while playing the simulation, participants were alone, and monitored 
from a nearby room via a camera streaming setting. 


3.3 Participants 


Seven (7) English-speaking individuals currently residing in Tallinn, Estonia, 
participated in the study. Participation was voluntary, and the age range of 
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Fig. 3. Stages of the experimental procedure 


participants was 18-40 years old. Even though the broader context of the study 
refers to the vulnerable population (refugees), no actual refugee applicant or 
any other vulnerable group was included in the study sample. Nevertheless, 
recruitment embraced participants from foreign countries that would be more 
likely to have a reference of the border control inquiries. 


3.4 Data Collection 


The simulation embeds a data collection procedure that synchronizes partici- 
pants’ physiological data with their emotional self-reports. The goal is to enable 
the participant to experience and disclose emotional states during the physio- 
logical data acquisition without disruption or further interference. 


Physiological Data. Participants’ physiological signals are gathered via BITal- 
ino Plugged kit, a low-cost biosignal acquisition device [27]. The sampling fre- 
quency of the used ECG is 1,000 Hz, as recommended by [28]. EDA and fEMG 
measurements are collected at the same sampling rate due to using a single 
BITalino device to collect the three physiological at the same time. The BITalino 
device was connected to a high-performance computer via Bluetooth. Partici- 
pants’ physiological information thus is collected from three channels - ECG, 
EDA, and EMG - and continuously recorded using the compatible software 
OpenSignals’. 

Sensors were placed on the clean surface of participants’ skin, following rec- 
ommendations of reviewed literature [22,27]. Figure 4 illustrates the configura- 
tion used in the experiments. Two electrodes were used to measure EDA, placed 
on the thenar and hypothenar eminence of the right hand as illustrated in Fig. 4a. 


! Available at - https://support.pluxbiosignals.com/knowledge-base/introducing- 
opensignals-revolution/ 
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Fig. 4. Electrodes placement for biosignal data collection 


For collecting ECG, three electrodes were placed on the participants chest, two 
on both sides of the clavicles and one at the lower left part of the rib cage, as it is 
illustrated in Fig. 4b. Additionally, facial muscle activity (fEMG) was acquired 
via three electrodes placed on participants” face, the reference electrode in the 
middle of the forehead and the two others on the Corrugator Supercilli and 
Zygomathicus Major, as in Fig. 4c. 


Emotion Self-report. To collect the emotional self-reports of experiment par- 
ticipants on the go, the interface simulation uses a digital tool based on the 
self-report paradigm proposed by the Geneva Emotion Wheel (GEW) [2]. Self- 
report paradigms are used to grasp the individual nature of emotional experience, 
which reflects the integration of mental and bodily changes in the context of par- 
ticular events. Researchers in emotional labeling point out that even though the 
results obtained by these paradigms are plausible and interpretable, the statis- 
tical analysis is hampered by the abundance of emotion labels collected, making 
the interpretation complex [2]. This way, embedding the GEW into the sim- 
ulation allowed the emotional assessment to be done interactively through an 
adapted version of the tool and provided homogeneity of reports. The choice 
also benefits the processing and synchronization of physiological data collected 
through the sensors. 


3.5 Preprocessing Data 


The multisource dataset required a composite of processing and analysis meth- 
ods. Constraints faced at this stage caused the removal of the fEMG data from 
analysis. At first, the ECG and EDA data was downsampled from 1000hz to 
100hz to reduce computational overhead. Thereafter, filtering to remove noise 
from the ECG and EDA data was done automatically as part of the Neu- 
rokit2data preprocessing pipeline [18]. Furthermore, we divided the ECG and 
EDA data into epochs with a duration of 10s, starting one second before the 
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question presentation and finishing nine seconds after that. Compiling the infor- 
mation into epochs returned a list of numeric identifiers for the physiological 
reactions regarding each question. The 10s epoch duration was based on the 
calculation of the participant’s question average time (Max: 12min 32s, Min: 
08 min 15s). Hence, each participant produced 100 data samples, thus giving us 
a total of 700 data samples. Two features were extracted from ECG data, ECG 
Rate, and ECG R Peaks. From EDA data, we obtained EDA Phasic and EDA 
Tonic features. All these were achieved using the Neurokit2 python library [18]. 


3.6 Analysis 


Following the dataset containing the labeled psychophysiological data of the 
sample (N = 7) was statistically analyzed, focusing on features of the physiologi- 
cal measurements acquired alongside emotional self-reports. Independent t-tests 
(performed two-sided) were used to draw comparisons between the emotional 
reports and their specific physiological reactions. A P value of 0.05 was assumed 
to indicate the statistical significance of the tests. Furthermore, correlation tests 
were performed to observe the relationship between the physiological data fea- 
tures and the emotion labels indicated by the participants. The statistical anal- 
ysis was performed using SPSS software (ver. 28.0.1.1). 


4 Results 


Among the negative valence emotions, the most reported emotion was Disap- 
pointment, representing 12.9% (n = 90) of the total answers. Fear was reported 
in 9.4% (n = 66), followed by Sadness in 9.1% (n = 64) of the answers gathered. 
Among the positive valence emotions, Relief was the most reported, chosen at 
8.0% (n = 56) of the time, followed by Interest, reported at 6.4% (n = 45), and 
Pride, 4.0% (n = 28). Issues with the self-assessment led the analysis not to 
include the variations of intensity reported by participants. Manifested differ- 
ences between the data across participants hampered a separate analysis of the 
emotional reactions, leading the study to consider the emotions between negative 
and positive valence. 

The analysis has shown that the negative stimuli increased cardiac activity 
in participants compared to positive stimuli. The mean of the ECG Rate for 
Positive emotions was M = .6452 (SD = .6760), and for Negative emotions, it 
was M = .6645 (SD = .7092). However, no significant difference was found in 
the ECG Rate between the emotions of positive and negative valence (T623 
= —.330, p = .741). This way, tracing general conclusions about participants’ 
emotional arousal through examination of cardiac activity was unattainable. 

A similar issue was found in the features of EDA data. Measurements of the 
EDA Tonic were scattered and not pronounced. The little difference between the 
EDA Tonic values could indicate that the quick flow of questions had piled up 
different stimuli and affected the measurements in the tonic component, which 
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is distinguished for happening slowly over time. On the other hand, measure- 
ments of the EDA Phasic suggested that negative emotion might have increased 
electrodermal activity response if compared to the positive reports. The mean 
of EDA Phasic for Positive emotions was M = —1.5396E—4 (SD = 1.243E—3), 
while Negative emotions had a mean of M = —4.9404E—4 (SD = 1.859E—3). 
Additionally, the parametric test revealed a significant difference in the means 
of EDA Phasic between groups of positive and negative emotions (T601.852 = 
2.728, p = .007). 

Furthermore, Pearson's correlation tests were performed to observe the rela- 
tionship between the EDA and ECG measurements. Results have indicated that 
ECG Rate and EDA Phasic have a significant, weak negative correlation (r = 
—.316, p<.001). In contrast, ECG Rate and EDA Tonic have a significant moder- 
ate, positive correlation (r = .603, p< .001). Such findings can indicate measures 
to consider in future studies. 


5 Discussion 


In an experimental investigation, we showed that our prototype enabled the 
simultaneous collection of participants” quantitative and qualitative affective 
information during a simulated interview. This pilot study allowed us to identify 
what needs to be iterated for the upcoming trials of this study. Moreover, the 
process of the study illustrated the complexities entangling psycho-physiological 
studies in HCI. The emotion elicitation method and designed data collection 
interface had a positive outcome. However, generalized conclusions regarding 
associations of self-reported emotional experience and the two features of phys- 
iological measurements, EDA and ECG, were not attainable. 

The analysis of the ECG and EDA measurements across different emotion 
reports did not allow the distinction of specific emotions experienced by the 
participants. Still, the results aligned with the previous studies showing that 
negative emotions are characterized by strong cardiovascular and electrodermal 
activity [12]. Besides, the results suggested that combinations of measures of 
ECG Rate could be used alongside EDA Tonic or EDA Phasic to detect positive 
and negative emotions. 

The interview context was appraised negatively among participants, as Dis- 
appointment and Fear were the most reported emotions among the responses. 
For that, the fictitious scenario was considered enough to situate the participants 
in the research context and allow them to identify with the emotional situation. 
Besides, the different sorts of appraisal triggered by the simulation suggested 
that using words from the ANEW list [19] had contributed to the necessary 
stimuli and evoked different assessments of the given stimuli. 


6 Conclusion 


In general, emotion recognition is based on automated classifiers trained to iden- 
tify general emotional states from patterns found in the users’ biosignals, whether 
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explicit (i.e., from facial expressions and gestures) or implicit (i.e., from biosen- 
sors acquisition). Converged tasks are required for achieving accuracy and reli- 
ability in such systems [11,14,22]. HCI practitioners and designers should take 
advantage of physiological and affective computing methods to develop accurate 
emotion recognition models. This study adds to the evidence that simulated set- 
tings could be used for eliciting emotional reactions, especially if the content of 
the simulation is framed through validated emotion elicitation approaches, such 
as the affective words of the ANEW list [19]. The findings of this pilot pave the 
way for novel methods that enable the understanding of emotional experiences in 
simulated interviews. We envision applications of the methods for assessing the 
user experience’s emotional aspects with other interactive systems and different 
contexts. 


Limitations and Future Studies. We acknowledge that emotion is a subjec- 
tive context-dependent experience bound by different factors, whereas the indi- 
vidual appraisal of the situation, culture, age, and gender, among other aspects. 
Thus, the specificity of each individual appraisal and reaction to the emotional 
stimuli makes the generalizations about the relationships between the perceived 
affective information and the physiological reaction measurements very complex. 
Future research should account for the different physiology of people. Moreover, 
calibrated individual measurements could help to investigate bodily reaction pat- 
terns. Adjustments to experiment settings should cover the definition of a fixed 
length for the stimuli, the review of the simulation content, and the possibility of 
a less extensive experiment setting. Also, a larger sample size would be required. 
Refinements could also approach developing the necessary pipelines and testing 
different data processing methods, feature extraction, and analysis to achieve 
generalization. Trends among the data could be analyzed in future research by 
classification and clustering. Future studies would also benefit from using an 
alternative experimental design to test the influence of the specific aspects of 
the simulation. 


Ethical Implications. Recognizing the emergent risks connected to the 
automation of refugee status determination assessment is necessary. The nov- 
elty of the IVAs may serve as a tool for facilitating the initial screening at the 
border control stations. Nonetheless, the emotion monitoring for the adaptabil- 
ity of such systems should be made explicit to the users before the interaction. 
The access to a person’s mental and emotional states may be seen as invasive 
and place users in a vulnerable position, besides compromising the levels of trust 
in such interactions. Foremost, the assessment of implicit biosignals should not 
serve surveillance purposes nor deception detection. 
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Abstract. NAO is a small humanoid robot which affords multimodal interaction 
through speech, non-verbal sounds, visual pattern recognition, gestures and touch. 
NAO can be animated to move its head, arms, legs in space and to manifest 
emotional reactions through dialogue, sounds, body movements and light effects. 
This paper reports on the design, implementation, play-testing and evaluation of a 
multimodal, playful interaction with NAO in two pre-studies and one pilot study 
with altogether 209 participants of all age groups. The application “NAO says” 
was designed based on the popular imitation game “Simon says”, in which three 
or more players follow the command “Simon says”. In “NAO says” the robot 
plays the role of Simon and asks the players to play a series of mini-games by 
imitating body movements and solving simple mathematical riddles. The design 
of “NAO says” focuses on creating an experience of a less constrained, playful 
interaction rather than following strict rules of the game. The paper describes 
the design of the game, the implementation in pilot studies and the results from 
three evaluations which investigated the perceptions of NAO as a game leader, 
and perceived psychological stress before and after the playful interaction with 
the robot. The results indicate that the robot was perceived as a friendly, joyful 
and pleasant interaction partner and that perceived stress was lower after playing 
the game. 


Keywords: Human Robot Interaction (HRI) - Playful interaction - Interactive 
games - Humanoid robots - Social robotic game - Stress reduction 


1 Introduction 


Humanoid robots offer new opportunities for playful interactions with humans. Playful 
interaction design in Human Computer Interaction (HCI) is rooted in the perspective 
of humans playful creatures or as “Homo Ludens” engaging in playful, ludic activities, 
which take place within fixed limits of time and place, and according to freely accepted 
but binding rules [1]. Playful activities absorb the players and evoke intense feelings 
of joy and tension or excitement, bringing the players beyond the experience of the 
“ordinary life” [1]. The distinction between playful interactions, serious games and 
gamification has been made in relation to the non-utilitarian character of play [2]. Playful 
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interactions have been associated with such aspects as curiosity, exploration and the 
experience of wonder [3], as well as with an inherently pleasurable experience [4]. 
Research studies explored the possibilities and effects of playful interaction in the field 
on Human Robot Interaction (HRI) including educational robotics. For example, [5] 
developed an educational robotic system with a driving robot and a programming-board 
with command-bricks to support tangible, social and playful interactions in context of 
school education. Other studies with children, focused on the potential of playful human- 
robot interaction for learning and cognitive development. For example [6] designed 
playful child-robot interactions for language learning and the results showed promising 
effects. Studies have shown that playful interactions with technologies can be beneficial 
not only for children but for human players of all ages. Playful activities develop their 
potential in physical and social interactions especially through the engagement of the 
whole-body of all players [7]. These aspects, especially physical and social aspects of 
playful interactions including whole-body engagement, have played an important role 
in the design of the application “NAO says” presented in this paper. 

The application “NAO says” is based on the popular imitation game “Simon says”, 
in which three or more players follow the command “Simon says” and act on the follow- 
up task. If, however, the game leader Simon does not say “Simon says” but a player 
acts according to the task, this player must quit the game. In “NAO says” the humanoid 
robot NAO plays the role of Simon and asks the players to engage in playing a series of 
mini-activities by following the command “NAO says”. NAO is a small humanoid robot 
which affords multimodal interaction through speech, non-verbal sounds, visual pattern 
recognition, gestures and touch. NAO can be animated to move its head, arms, legs in 
space and to manifest emotional reactions through dialogue, sounds, body movements 
and light effects. The mini-activities include imitating body movements of the robot 
and solving simple mathematical riddles while playfully interacting though speech and 
touch. These playful interactions are embedded in a social context and include a group 
of players following the commands of the robot together at the same time. “NAO says” 
as a design-led intervention aims to create a pleasurable experience through playful 
human-robot interaction embedded in less constrained settings, without the pressure to 
follow strict rules of a game. 

This paper reports on the design, programming, implementation, playtesting and 
evaluation of this multimodal, playful interaction with NAO in two pre-studies and the 
main pilot study with altogether 209 participants of all age groups. The reminder of this 
paper is structured as follows. Section 2 outlines the design and the programming of the 
application “NAO says” with references to the different versions of the game “Simon 
says”. Section 3 describes the design of the studies themselves, i.e. two pre-studies with 
university students and the main pilot study, and the research methods applied in these 
studies. Section 4 presents the results from the two pre-studies and the main pilot study 
focusing on players’ perceptions of the NAO robot as game leader and the game “NAO 
says”, and perceived levels of stress before and after playing the game. The paper ends 
with conclusions and recommendations for further research. 
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2 Design and Programming 


The application “NAO says” was designed to afford playful interactions with the 
humanoid robot NAO and was inspired by the popular imitation game “Simon says”. 
The game “Simon says” is structured around playful interactions in a group of players, 
who follow the command “Simon says”, engaging in a series of playful mini-activities. 
These mini-activities in the “Simon says” game vary from game to game version, but 
usually include physical, cognitive and social components which allow to engage gaze, 
speech, and motion. “Simon says” game has been developed and applied in many dif- 
ferent versions. The classic version of the game involves human players playing in the 
physical room and one of the players taking the role of Simon. Studies on “Simon says” 
with human players explored its effects for learning outcomes, especially in language 
learning. The study by [8] showed that playing “Simon says” had a significant effect on 
listening comprehension of senior high school students. The study by [9] showed that 
the game improved vocabulary mastery in learning English. 

More recently, different versions of the “Simon says” game have been developed 
using technologies to support the gameplay. For example, [10] developed the “Simon 
says” game as a mobile application for mobile phones with commands focusing on color 
identification. A study in the context of long term care (LTC), applied a “Simon says” 
activity with a robot, in which older adults took turns as leaders [11]. The researchers 
concluded that robots are promising for social engagement of older adults who suffer 
from apathy [11]. Another version of the “Simon says” game was developed with a 
humanoid social robot and included a computational model of turn-taking to support a 
more natural interaction during the gameplay [12]. Finally, the study by [13] focused 
on bodily movements and implemented the human pose detection library OpenPose to 
capture players’ poses [13]. 


2.1 Design 


The design of the “NAO says” game presented in this paper focused on the multimodal 
and multi-sensory playful interaction of the NAO robot and human players of all ages. 
The use case scenario for the design of the “NAO says” game was a popular public 
event “Long Night of Sciences” which takes place every year at research institutions 
all over Germany on a specific day in June. On this day, scientific and science-related 
institutions open their doors and invite general public to visit and actively participate in 
experiments, demonstration, lectures, science shows, and guided tours. Playing “NAO 
Says” was offered as an interactive game event during “Long Night of Sciences 2022” 
at Berlin University of Applied Sciences, Germany, on 2nd July 2022. The “NAO says” 
game was embedded in a social context of this public event with multiple, voluntary 
participants engaging in playful interactions with the robot. The setting of the scenario 
was defined to be a university laboratory room, which was open during the event for 
general public to walk in and participate in the game at defined times. Based on previous 
experiences from the event “Long Night of Sciences” and the character of the “NAO 
says” game, the scenario defined families with children and young people as the primary 
audience and target group. 
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The “NAO says” game was designed from the human-centered design (HCD) 
perspective, following the scenario-based design (SBD) approach [14]. Scenarios are 
task based and descriptive, i.e. events and activities are strung together in purposeful 
sequences and provide a real-world description of the contents, flow, and dynamics 
[14]. The design was developed and tested iteratively. The design process included a 
joint co-creation of a scenario-based script in a project team, joint programming by two 
authors of this paper, playtesting in two pre-studies with university students, and finally 
the implementation and evaluation in the main study with 190 participants of all ages 
during the event “Long Night of Sciences 2022”. 

The gameplay in “NAO says” includes a series of playful mini-activities which 
encompass physical and cognitive tasks. During the gameplay, human players are asked 
by the NAO robot to follow only when they hear the command “NAO says”. The rule 
from the classic version of the “Simon says” game, in which a player drops out of 
the game if he/she follows although there was no command, was not included in the 
interaction design. The reason not to include this rule was the focus on the playful, more 
stress-free and less competitive interaction rather than following the strict rules of the 
game and players having to quit the game. 

The “NAO says” gameplay includes a total of ten playful mini-activities. Some of 
these mini-activities were based on existing animations for NAO, which are available in 
standard libraries of the Choregraphe software used to program the NAO robot. These 
included the “Saxophone”, "Elephant", “Gorilla” and “Take a picture”. These ready- 
made building blocks for the game were selected as suitable for playful interactions, 
since they contain both clear body movements and corresponding sounds, which enhance 
playful engagement. Further existing animations, such as the “Air guitar”, were combined 
with new sound effects, which were imported as free sound files from the Internet 
and integrated in Choregraphe. For the purpose of the “NAO says” game, some own 
animations were programmed in Choregraphe and added to the gameplay. The self- 
developed animations included: “Stand on one leg”, “Rub tummy, pad head”, “Wave 
arms above the head”, “Smile” and “Maths”. In total, the following ten mini-activities 
were used in the “NAO says” gameplay: (1) Gorilla, (2) Elephant, (3) Air guitar, (4) 
Saxophone, (5) Take a picture, (6) Stand on one leg, (7) Rub tummy, pad head, (8) Wave 
arms above the head, (9) Smile, and (10) Maths. Figure | visualises all mini-activities 
arranged into categories: animals, music, body and other. 
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Animals Music Body Other 


Elephant Saxophone One leg Smile 


Gorilla Air Guitar Rub tummy Take picture 


Wave arms Maths 


Fig. 1. Playful mini-activities included in the “NAO says” game. 


2.2 Programming 


The programming of the “NAO says” game was done using the Choregraphe software 
(Version 2.8.6). The game was designed in the English and German language versions 
which were tested in two pre-studies with students. Figure 2 visualises the programming 
of the “NAO says” game in Choregraphe with different elements such as animations, 
animated say, speech recognition and tactile sensors (bumpers). 
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ENG 


Text Edit 
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|e<74ci/ event from armi Welcome to the 
‘game: Nao says! 
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the game quickly. 


v 
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Fig. 2. Programming of “NAO says” in Choregraphe (English version). 


When programming the game, four main challenges emerged: (1) How to program 
new mini-activities and which existing animations can be adapted for the purpose of 
“NAO says”?; (2) How to make the interaction with NAO possibly seamless with a 
larger number of humans players and observers present in the room and standing at a 
relatively large distance from the robot?; (3) How to make the gameplay exciting without 
repeating the same sequence of mini-activities? These challenges were addressed in the 
programming phase as described below. 
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How to program new mini-activities and which existing animations can be adapted 
to the purpose of the game? In order to create new mini-activities, such as “Stand on one 
leg” or “Rub tummy, pat head”, the Timeline editor, an integrated tool in Choregraphe, 
was to store different positions of the robot in a timeline and play one after the other as a 
fluid sequence of animations. The Choregraphe Timeline editor enables the programmer 
to adjust every single angle of NAO’s motors using simple sliders. Additionally, this 
method was combined with the use of the Animation mode. When NAO is in Animation 
mode, the angles of the joints on the real robot can be adjusted and the joint position 
saved as a point on the timeline. From there, NAO can be moved to the next position. 
This procedure results in a fluid animation from the individual positions at the end. When 
creating animations, special attention must be paid to the length between the individual 
positions of the animation. If the length is too short, jerky movements can occur. These 
not only look unnatural, but also cause the motors to heat up more quickly which can 
even damage the robot. In addition, special care must be taken to ensure that the NAO 
robot is always in a stable position. The ready-made animation “Gorilla” available in the 
Choreographe library demonstrates this problem: In “Gorilla”, NAO drops forward onto 
its hands and sometimes falls over, either because the animation is executed too quickly 
or because the ground is not ideally flat. Therefore, creating new animations like “Stand 
on one leg” was particularly difficult to implement, as NAO had to be kept in a stable 
position during the entire animation. 


How to make the interaction with NAO possibly seamless with a larger number 
of humans players and observers present in the room and standing at a relatively 
large distance from the robot? The scenario was designed for participation of multiple 
players and observers, all present in one room with NAO during the public event at 
university premises. The presence of many participants enhances the risk of a high 
volume of background noises which may impede speech recognition of the NAO robot. 
Therefore, the decision was made to limit the number of mini-activities with human- 
robot interaction via speech. In fact, the only mini-activity, in which speech input from 
the participant is necessary, is the maths activity. In the maths activity NAO asks “How 
much is 3 multiplied by 3?” and expects the answer “nine” from the participants. Also 
during the game “NAO says” NAO also asks a number of times “Did you understand 
everything?” and waits until the answer “Yes” is said by a participant. However, these 
interactions via speech are only possible when there are no background noises in the 
room and possibly only one participant at a time speaks loudly and clearly. In the scenario 
at the public event with approx. 20 participants in the room at the same time, a loud and 
clear response was foreseen to not be feasible. Therefore, it was decided to keep the 
threshold very low so that everything that sounds similar to “nine” could be recognized 
by the NAO robot as nine. However, this method had the disadvantage as, for example, 
“nineteen” or other numbers are also recognised as “nine”. Therefore, the final threshold 
was set to 30%, i.e. any utterances that sounds 30% like “nine” are recognized as a “nine”. 


How to make the gameplay exciting without repeating the same sequence of mini- 
activities? As described above, a pool of mini-activities was created to provide a variety 
of non-recurring playful mini-activities, and in this way to enhance the user experience. 
In order to make the gameplay exciting, the randomization principle was applied in the 
programming of the game. Randomness of game elements is linked to uncertainty, which 
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is a frequently overlooked in game design, but an important element for the overall game 
experience as it holds players’ interest and enhances engagement [15]. To incorporate the 
randomization for the path-finding command “NAO says” before movement, a random 
variable was included which was then queried with an If-condition. The challenge here 
was that the positive feedback (praise) given by the NAO robot at the end of the mini- 
activity was not in the same programming level as the dice rolling of the random number. 
To address this challenge, a separate random number was chosen for each mini-activity 
and duplicated this mini-activity. In the final version, different responses of the robot 
were programmed depending on the use or non-use of the command “NAO says”. 

The different versions of the “NAO says” game were tested in the two pre-studies 
and the final version in the main pilot study at the public event as described below. 


3 Methods and Studies 


Following the design and the programming phase, the game was play-tested with uni- 
versity students in two pre-studies and the final version was implemented and evaluated 
during the public event with participants of different ages. Playtesting is a popular method 
in game research used to test perceptions and preferences of players, allowing designers 
to modify the game before delivering the final version [16]. The key facts about the 
pre-studies and the pilot study are summarized below and in Table 1. 


Pre-study 1. The first pre-study involved a sample of ten university students, who 
volunteered to test the English version of the initial version of the “NAO says” game. 
Participants were asked to fill in an online survey before and after the game. One of 
the key results from the first pre-study was the wish of students to play the game in 
the German language version. Therefore the German language version was created in 
Choregraphe and tested in the second pre-study. 


Pre-study 2. The second pre-study involved a sample of nine students, who volunteered 
to test the German version of the “NAO says” game and did not participate in the first 
pre-study. This version of the game also included a slower pace of NAO’s speaking as the 
result from the first pilot study clearly indicated the need for slower speed to understand 
better what to do in each mini-activity. Like in the first pre-study, the participants were 
asked to fill in an online survey before and after the game. 


Pilot Study. The main pilot study took place during the public event “Long Night of 
Sciences” with participants of different ages. Out of approx. 260—280 participants on that 
day, 190 persons filled in the evaluation survey which was administered before and after 
playing the game with NAO. The survey was paper-based to ensure high participation 
of persons without digital devices and of younger children. 
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Table 1. Summary of the pre-studies and the main pilot study of the game “NAO says”. 


Pre-study 1 
n= 10 


Pre-study 2 
n=9 


Pilot study 
n= 190 


English version 


German version 


German version 


University students 


University students 


General public 


Classroom setting 


Classroom setting 


Public event setting 


50% female, 50% male 


44% female, 56% male 


46% female, 51% male, 3% diverse 


70% 20-24 years old 
30% 25-29 years old 


56% 20-24 years old 
22% 25-29 years old 
11% 30-34 years old 
11% 35-39 years old 


3% younger than 7 years old 
31% 7-18 years old 

35% 19—29 years old 

5% 30-39 years old 

17% 40-49 years old 

8% 50-59 years old 

1% 60 years old and older 


4 Results 


The key results from the studies related to: (1) perceptions of the robot and the game 
“NAO says”, and (2) perceived stress level before and after playing the game are 


described in the sections below. 


4.1 Perceptions of the NAO Robot and the Game “NAO Says” 


The data about the perceptions of the participants of the NAO robot as game leader and 
of the game “NAO says” was collected via online surveys in the pre-studies and via 
a paper-and-pencil survey in the main study. Both online surveys included additional 
questions which were not asked during the main study due to the specific conditions of 
the public event. The online surveys ask the question How did you perceive NAO as a 
game leader? This question was answered by rating five pairs of semantic items from 
the Likeability Scale of the Godspeed questionnaires rated on a scale from 1 to 6 [17]. 


Table 2 summarises the results from the Likeability Scale. 


Table 2. Perceptions pre-studies and the main pilot study of the game “NAO says”. 


Pre-study 1 


Pre-study 2 


unlikeable (1) — likeable (6) 


M = 5.40 (SD 1.265) 


M = 5.89 (SD .333) 


unfriendly (1) — friendly (6) 


M = 5.30 (SD 1.337) 


M = 6.00 (SD .000) 


unkind (1) — kind (6) 


M = 5.40 (SD 1.265) 


M = 5.89 (SD .333) 


unpleasant (1) — pleasant (6) 


M = 5.20 (SD 1.317) 


M = 5.56 (SD 1.014) 


awful (1) — nice (6) 


M = 5..30 (SD 1.252) 


M = 5.56 (SD 1.014) 
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The results show that in both pre-studies NAO was perceived as a likeable, friendly, 
kind, pleasant and nice game leader. The data also shows higher values in the pre-study 
2 in which the German version of the game was used which may indicate that the use of 
the local language may have enhanced positive perceptions of the robot. 

Next, perceptions of playful interactions with the robot were captured in all three 
studies using the simple question “How did you like the game?” and asking participants 
to assess their perception on a scale from 1 = not at all, to 6 = very much. The mean 
values were as follows: (1) pre-study 1, n = 10: M = 5.10 (SD .738), (2) pre-study 2, n 
= 9: M = 5.33 (SD .707), (3) pilot-study, n = 190: M = 5.25 (SD = .885). These results 
indicate that participants in all three studies enjoyed playful interactions with NAO. The 
foreign language version of the game in English in the first pre-study received the lowest 
value, which again indicates that the language choice affects user experience. The high 
average rating of M = 5.25 in the main study with 190 participants show that participants 
in different age groups liked the game. 


4.2 Perceived Level of Stress Before and After the Game 


Perceived psychological stress was measured to explore whether there were any changes 
in how stressed or relaxed participants felt before and after playful interaction with NAO. 
The data about perceived stress was collected via online surveys in the two pre-studies 
and via a paper-and-pencil survey in the main study. Psychological stress was reported by 
the participants before and after playful interactions with NAO using the Perkhofer Stress 
Scale, which is a validated single item scale [17]. The participants assessed their stress 
level on the scale from 1 = no stress (“fully relaxed”) to 6 = fully stressed (“‘anxious” 
before and after the game. To explore the differences in perceived stress before and after 
playing “NAO says”, the dependent samples (paired) t-test was computed at the 95% 
confidence level and two-tailed p-value using IBM SPSS software. The comparison of 
means showed that in all three studies the mean values for perceived stress before the 
game were slightly higher compared to the values after playing the game. In the first 
pre-study (n = 10) the mean values were M (before) = 2.60 (SD .843) and M (after) 
= 2.40 (SD 1.174). In the second pre-study (n=9) the mean values were M (before) = 
2.56 (SD 1.014) and M (after) = 1.33 (SD 1.000). In the third pre-study the mean values 
were M (before) = 2.11 (SD .910) and M (after) = 1.74 (SD .917). Table 3 summarises 
the results for all three t-tests. 


Table 3. Paired samples t-test: Perceived psychological stress before and after the “NAO says” 
game. 


Pairs | Mean | Std. Deviation | Std. Error Mean |t d Sig. (2-tailed) 
Pre-study 1 | 10 |.200 | 1.229 398 514 9 | .619 
Pre-study 2 9 | 1.222 | 1.302 434 2.817 8 |0.23 
Pilot study |190 |.374 | .949 .068 5.488 |189 | <.001 
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The results show that the means of perceived stress before and after the game differed 
significantly only in the main pilot study. It can be concluded that the perceived level 
of stressed was statistically lower after the game “NAO says” and changed from 2.11 
+ 0.91 to 1.74 + 0.92 (p < 0.001). The results also show that the initial stress level 
of the 190 participants in the main study during the public event was slightly lower 
compared to the two pre-studies with students, which may be explained by the leisure 
character of the event compared to participation in classroom settings. Nevertheless, 
the participation of volunteers visiting the laboratory during the public event limits 
the possibilities of generalizing the results of the study. It scan be assumed that the 
participants in the main study differed from the general population and from populations 
with special needs in regards to their level of initial motivation to participate as well as 
their interest in and attitudes towards robots. Therefore it is recommended to conduct a 
broader follow-up study involving a more diverse sample and including variables such 
as interest, motivation and attitudes towards robots. 


5 Conclusions 


This paper reported on the design, programming, implementation and evaluation of 
playful interactions during the game “NAO says” with the humanoid robot NAO in two 
pre-studies with students and one pilot study with 190 participants of different ages. 
The exploratory results in all three studies showed that the players perceived NAO as a 
likeable, friendly, kind, pleasant and nice game leader, and enjoyed playful interactions 
with NAO. Additionally, there was a significant difference in the perception of own’s 
psychological stress before and after the game with NAO in the pilot study with 190 
participants. The results also indicate possible effects of different language versions of 
the game on user experience. The results presented in this paper are to be understood 
as preliminary, exploratory results and as a starting point for further research. Further 
studies should be conduced with diverse samples and look closer into possible effects 
of different versions of the game. The paper also pointed out several challenges in the 
design and programming of the game “NAO says” and how these were addressed. Further 
studies could explore in more detail which design strategies of playful interactions in 
games like “NAO says” and which types of feedback from the robot are most effective 
for specific target audiences. Furthermore, future studies could explore how different 
types of playful interactions with robots may affect the perception of mood and stress 
as well as physical stress measures. 
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Abstract. Air quality affects health and, unfortunately, has been dete- 
riorating rapidly recently. The problem is significant in smaller towns 
where the important source of pollution is the heating of households 
and water from individual sources. Therefore, the inhabitants have an 
influence on a significant reduction of pollution in their area, but at the 
same time, they are very often not aware of it. Raising awareness about 
household-related air pollution plays a vital role as systemic solutions 
proposed by state and local authorities require the support of local com- 
munities. The situation has recently become even more serious as we are 
facing a crisis caused by the Russian war in Ukraine, which has led to 
an increase in energy and fuel prices and has postponed restrictions on 
the use of solid fuels or even incentives to use inferior fuels. Pathologies, 
such as burning garbage in old-style furnaces, have still not been elim- 
inated. One of the ways of raising citizens’ awareness was to be public, 
easily accessible information about air quality. Many portals, services, 
and applications currently provide local air quality data, but few peo- 
ple use them. One reason may be that the figures and graphs can be 
confusing or unattractive to audiences who are not used to reading sci- 
entific reports. Visualizing air quality with augmented reality overcomes 
these obstacles. A mobile application that can use local elements as trig- 
gers and a symbolic representation of air quality based on data read in 
real-time from sensors is simple, attractive for non-experts, and has an 
additional educational value. We present the experience of creating such 
an application and prototype tests with the participation of potential 
users. Unfortunately, the collected results confirm the low awareness of 
excessive pollution in a given area and its negative impact on health. 
However, the interest of potential users and positive opinions about the 
tested prototype fill with optimism. 


Keywords: augmented reality - air pollution - user experience - art 
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1 Introduction 


Visualizing air pollution in the real world is not a new phenomenon, but it 
has largely been contained in the domain of arts, with projects aiming to make 
the local population aware of the pollution’s negative effects on their health. A 
traveling art project, giant lungs made of air filters whose color changed with 
time when exposed to polluted air, in multiple editions toured the most polluted 
cities in Poland! Such projects have great value, yet they have limitations as they 
are confined to a specific location and do not show dynamic changes in pollution 
rates. Still, such campaigns are a crucial step in limiting pollution in the most 
polluted areas, especially if pollution comes from residential sites. A decrease 
in emissions can be seen following informative action [4], and such initiatives 
often pave the way for necessary policy. Still, it is as they garner public support 
for such initiatives. For these reasons, we have decided to design and verify an 
Augmented Reality (AR) application representing current pollution levels based 
on location data. Seeing real-time pollution data on a personal mobile device 
and having it visualized in a manner that is meaningful to the device-holder at 
a place where they are currently has the potential to increase awareness of air 
pollution presence and its negative health effects. 
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Fig. 1. PM2.5 values measured every hour during one day in winter and summer 2021; 
a town with up to 25,000 inhabitants, data from the Environmental Protection Inspec- 
tion [5] 


1.1 Household Related Air Pollution 


Air quality affects the health of citizens and, unfortunately, recently has been 
deteriorating rapidly. However, when effects are delayed, people tend to disre- 
gard their severity, even though multiple papers closely link negative health out- 
comes, especially respiratory diseases, and reduced lung capacity, to air pollution 


1 The images from the project, by campaign creators, are available here: https://www. 
purpose.com/poland-air-pollution-campaign/. 


152 G. Pochwatko et al. 


[11]. The problem is particularly important in smaller towns where the impor- 
tant source of pollution is the heating of households and water from individual 
sources. One of the most dangerous pollutants is particulate matter, especially 
its fine fractions - PM2.5 [1,6,7]. Figure 1 shows how much household heating 
can contribute to a dangerous increase in PM2.5 levels. The chart shows the 
pollution levels recorded in hourly intervals in summer and winter in one of the 
Polish small towns (up to 25,000 inhabitants), which is dominated by single- 
family housing with its own heating sources. The PM2.5 level in the summer is 
low. In winter, however, it exceeds the norm many times in the afternoon and 
in the evening, when citizens usually return to their homes from work and turn 
on the heating. 


1.2 Air Pollution Representation 


The choice of the pollution representation is crucial considering the fact that the 
current methods of presenting information on the air quality turned out to be 
ineffective or gave inconclusive results (see e.g., [10]). For example, Campaña and 
Dominguez [3] proposed to present the different types of pollutants as enlarged 
models of chemical molecules. In another case, Prophet et al. [13] encouraged 
participants to take care of air quality in a symbolic way - by taking care of a 
virtual tree in an application using the data of the local pollution measurement 
station. 

In our opinion, the choice of the correct representation of air pollution should 
not be arbitrary. We do the choice in a multi-stage process, which we started with 
a participatory workshop with potential users [8] and continued with laboratory 
research (see e.g., [12]). Initially, four types of visual stimuli were selected and 
tested: 2 positive vs. negative and 2 concrete vs. abstract. At the time when we 
were preparing and testing the prototype presented in this paper, the laboratory 
studies were still in progress, so for the purposes of the AR app tests, we chose 
a concrete and negative representation - enlarged PM particles. 


1.3 Augmented Reality 


Augmented reality (AR) is a technology that is gaining in popularity and has 
a chance to stay ahead of immersive virtual reality due to the availability of 
interesting solutions for mobile devices|9]. 3D and animated objects can be easily 
placed in real space thanks to toolkits such as Vuforia? which make developing 
AR applications easier for various context, varying for workplace instructions, 
entertainment to product design. 

Vuforia is an AR add-on to one of the commercially available development 
engines for creating games, called Unity3d. The principle is that with the help 
of an application designed for a mobile device, an additional layer of abstraction 


2 Vuforia is a comprehensive software development kit (SDK) for Virtual Reality appli- 
cations available here: https: //developer.vuforia.com/. 
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can be added to the area seen by the camera. In this way, the visible area can 
be expanded with graphic elements, usually in the form of useful information. 

For this to happen, it is necessary for the application designer to design and 
prepare (in appropriate formats) two types of graphic elements. The first is the 
so-called marker, otherwise known as a trigger - that is, an object to whose pres- 
ence in the field of view of the camera the application will be “sensitive”, and the 
second is a target object (static image, animation, 3d model and many others) 
which appears in the field of view at or near the predefined marker (Fig. 2). 
Today, AR technology is increasingly used in industry, social media, and every- 
day products, among others. This is due to several basic needs that the market 
places on this technology, i.e.: product recognition and training, product visuali- 
sation, customer self-service, guided user manuals, and part recognition [14]. The 
increasing use of AR apps has been noted by several online industry journals. 
Insider Intelligence [15] notes the increasing share of users using augmented real- 
ity functionality, and Forbes [2] notes that this technology could be the future 
of social media and beyond. 


Fig. 2. First draft of the VAPE AR app: scanning QR code to obtain the app (left), 
scanning the trigger to launch Vuforia AR contents (center), exploring AR contents - 
pollution representation overlay (right). 


2 VAPE Augmented Reality App Design 


2.1 Purpose of the AR Application 


The aim of the design of the application was to create an engaging experience 
presenting air pollution with visual cues and to communicate scientific data in a 
comprehensible way, by doing so, potentially raising awareness of environmental 
and health-related issues caused by air pollution. The application was designated 
to be pretested during a scientific event held in Myszków (Poland), with the 
intention of further development in the future. The constraints affected by the 
out-doors-held event called for an easily accessible, compact and portable setup 
in addition to minimum device requirements. 


3 Myszków is a town with one of the highest air pollution levels in Poland and Europe. 
It was on the list of 50 cities with the most carcinogenic air in Poland. 
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2.2 Accompanying Poster 


The intention of creating an informative poster complementing the experience 
with scientific data has affected the decision to employ a trigger-activated AR 
system as a base design of the application. A 100 cm per 200 cm sized roll-up ban- 
ner commonly used in advertising at events was chosen as the print medium for 
the informative poster. The medium allowed easy and self-contained construc- 
tion, a great level of portability, and re-usability, additionally to the vast canvas 
surface. The poster was designed and prepared for print in Adobe Photoshop 
(CS6), composing hand-drawn digital illustrations and text created beforehand 
in Procreate. The overall design of the poster was intended to be simple with 
key scientific information, not to overwhelm the participants of the event and to 
encourage engagement with the application (Fig. 3). 
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Fig. 3. Testing the prototype - the trigger (left) and a user scanning the trigger (right) 


2.3 Implementation of Air Pollution Visual Representation 


The VFX representing air pollution was created in Unity’s Visual Effect Graph. 
The node-based Visual Effect Graph allows the creation of procedural animations 
of a vast number of real-time rendered particles. The particles were set to spawn 
evenly within a cube shaped-region of base measuring 6 m by 7m and reaching 
5m high above the ground. The particles were spawned constantly with their 
basic life-span set to 10s (not counting the added variation), and the overall 
capacity of particles of the VFX was set to 100000. The animation of the particles 
was driven by a turbulence node effecting in an appearance of the particles 
floating freely in the air. Each of the particles was assigned with a random 
texture from a set of 16 images using a flipbook particle output method. The 
images were edited microscopic photographs of actual air pollution particles. 
The assigned textures were given random sizes ranging from 0.5cm to 2cm. A 
gradient of colours from dark grey to black was assigned to the particles. 
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In order to raise awareness of health-related issues caused by air pollution, 
a visualisation of lungs has been chosen additionally to the visualisation of air 
pollution particles. The 3D model of lungs implemented in the application has 
been generated by segmentation of trachea and bronchi based on a computed 
tomography medical scan using 3D Slicer (4.11.20210226) software. The gener- 
ated 3D model was edited and retopologiesed using Pixologic Z-Brush (4R7). 
The model was exported to Unity using an fbx file. The model was enhanced 
with a VFX animation of the same air pollution particles used in the air float- 
ing VFX. The animation shows the particles streaming through the trachea and 
obscuring the lungs (Fig. 4). The 3D model was visualised and augmented in 
front of the poster. 


2.4 Application Development and Implementation 


The application was compiled in Unity (2021.2.0b12) Game Engine with a Vufo- 
ria (10.0.12) plug-in installed. The Vuforia plug-in allows easy setup of AR 
trigger-activated system. The visualisation augmented upon the surroundings 
of the user was set to be triggered by targeting the smartphone’s camera on the 
illustration of lungs presented in the centre of the informative poster. The scene 
build in Unity contained a Vuforia image trigger setting, animated Visual Effects 
(VFX) representing air pollution particles, and a 3D model of lungs with its own 
VFX animation. The application was built for android devices and preinstalled 
on a smartphone that was later used as a visualising device during the event. 
The presented application was a prototype and was not available for download 
from the internet by the participants, yet such an option of dissemination is 
possible in the future. 


Fig. 4. The animation of the particles entering via the trachea and obscuring the lungs. 
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3 Pretest Results 


The prototype of the AR application was presented to volunteers, participants 
of an open event promoting pro-ecological behavior in the city of Myszkow. The 
event was co-organized by local authorities and non-governmental organizations. 
Due to the restrictions caused by the COVID-19 pandemic, all events were held 
outdoors with appropriate precautions. The presentation of the prototype was 
one of the activities that the participants of the event (mostly parents with 
children) could get involved in. In order to better simulate the conditions in 
which the application will be used in the future, it was not announced in the 
information inviting to the event (posters hung around the city and spots on 
local TV). Participants could take part in a pretest, which was organized as 
one of many stands (e.g., art classes with local artists, Storm Hunters informa- 
tion stand, non-contact sports activities, 3D printing show). Participants who 
decided to test the AR application were encouraged to provide feedback to the 
researchers operating the stand. Many participants of the event spontaneously 
reacted to the information posted on the poster/trigger. As intended, they tried 
to scan the trigger on the poster with their own phones. In such a situation, 
the experimenters approached potential users, explained that it was a pre-test 
of the prototype and, for security reasons, it was not possible to use their own 
devices yet, but they could see how it works on the device provided for this pur- 
pose. They then showed participants how to scan a trigger on a poster, handed 
over a phone with the app installed, and began observing their interactions. If 
test subjects watched only the poster and lungs in the AR for a long time, the 
experimenters encouraged them to turn around and observe the area with the 
layer of pollution overlaid. Most of the participants spontaneously made such an 
exploration. Careful observation of the lungs with pollutants entering them was 
rather avoided. Most users focused on watching pollution particles in the air. 
Below are representative testimonials from participants in the pretest: 


«All this dirt makes a terrible impression. I would definitely like to have 
such an application to check whether, for example, I can go on a trip with my 
children.» 

«I didn’t know that so much dust was flying in the air, especially since it’s a 
nice day today, you can’t smell the coal smoke» 

«A man wants to do something for his health, he runs, exercises, and here 
you have something. If you check «on the portal», you can’t see it.» 

«Maybe it would make people realize that you can’t burn just anything» 

«Someone would have to want to use it. Some people don’t care.» 


Thanks to positive and negative comments from users, we have the oppor- 
tunity to develop the application. There is a need to conduct broader research, 
which would also indicate ways to popularize the application and ensure its 
usability, information value and impact on changing user behavior. 
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4 Conclusions 


Visualising key data with mobile AR is a reliable way of grounding the under- 
standing of information within the context of the environment it is displayed 
over. In this way, people living in a certain area may better realize the invisible 
presence of air pollution and how it may affect their health - but, as our subjects 
mentioned, only if they care to stop and check. Still, we hope that our applica- 
tion prototype will serve as a proof-of-concept for larger information campaigns 
on air pollution so that they are created in a more impactful and understandable 
way, relevant to their target users. 
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Abstract. Millions of Ukrainians are fleeing the war to the EU countries; in par- 
ticular, Poland has accepted more than three million refugees, among whom are 
mostly women and children. Some of them continue to work remotely in Ukraine 
and communicate with friends and relatives at a distance through technology- 
mediated ways. The aim of our study was to observe the role of perceived copres- 
ence during social contacts, both of professional and private character, and val- 
idate the Ukrainian version of the copresence scale among Ukrainian migrants. 
We collected 221 responses in an online and face-to-face study from Ukraine 
migrants. Analyses revealed that the Ukrainian Perceived Copresence scale has 
one factor and appropriate internal consistency. Perspectives of future research 
are proposed. 


Keywords: Copresence * Ukrainian > Migrant * Mediated communication 


1 Introduction 


2 Theoretical Context 


2.1 War Migration from Ukraine in Poland 


Migration makes it necessary to adapt to a new social environment and at the same time 
to maintain distance communication with those who remained in the previous place of 
residence, for this migrants use various technologies to mediate communication. The 
wave of mass migration from Ukraine as a result of the war and the military aggression 
of the Russian Federation became a necessary means of preserving life. As of mid- 
September, according to the UNHCR [10], more than 7 million temporarily displaced 
persons from Ukraine are in European countries, 1.3 million of them are in Poland. 
Since the beginning of the full-scale offensive, more than 6 million people have crossed 
the border from Ukraine to Poland. Distant communication for work and private con- 
versation became an important part of the Ukrainian migrant community in Poland. 
As communication is needed for mental health, research on the copresence effect in 
communication is actual, but for these purposes, Ukrainian language methods must be 
developed and adapted. 
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Fig. 1. Mediated communication (photo by Julia M. Cameron; pixels.com) 


2.2 Copresence 


Copresence has become a popular research topic in recent years, especially after the 
COVID-19 outbreak which forced millions of people to stay in isolation for many days 
or weeks. The term “copresence” was suggested to mark the quality of a communica- 
tion medium in human-human or human-machine interaction [7] and is defined as the 
degree of person-to-person awareness which occurs in the computer environment [9]. 
Our previous study [8] has proved a relationship between the amount of interpersonal 
communication, housing conditions (a shared or private room, number of children and 
adults in the household), and copresence with mental well-being in confinement during 
the COVID-19 lockdowns. Here, we present the validation of the Copresence scale [8] 
to Ukrainian language, which was used in a study that aimed to replicate these results in 
another special context of forced disrupted social relations, this time due to migration 
forced by war. This “war migration situation” has put millions of migrants in new hous- 
ing conditions, at a distance from their work and relatives, and in emotional positive 
and negative contacts. Here we describe a study carried out from May to September 
2022 in Poland, in which we aimed to validate the Ukrainian version of the copresence 
measurement instrument. 


2.3 Mediated Communication and War Migration 


Chen’s study of the role of internet communication in migrant adaptation shows that 
immigrants who remotely communicate online more frequently with relatives and 
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friends in their original country are less adaptive in terms of sociocultural adapta- 
tion. However, communicating with relatives and friends in the original country is 
no longer a significant predictor of intercultural adaptation when introducing demo- 
graphic characteristics into the regression analyses: no significant impact is identified 
for online communication with fellow immigrants on intercultural adaptation [3]. Cop- 
resence may be important to overcome the isolation of people using VR technology as 
it was approved for anxiety disorder [6]. Migrants experience “co-presence” with their 
loved ones through social media. Among second-generation Turkish-Dutch migrants 
who grew up in the Netherlands and migrated to Istanbul in adulthood [1] are in two 
different models of communication: ambient, fast-paced, background communications 
and also more direct, immersive, conversational modes. The careful shifting between 
these modes of social media communication produces migrants’ experiences of transna- 
tional emotional intimacy when emotion is mediated through digital media and gives 
psychological support to migrants. During the war and migration of Ukrainian family 
members, distant communication is the factor that influences the moral and psycholog- 
ical state of the combatant when family problems impact the psychological state of the 
service members during deployment in active military zones of anti-terrorist operations 
[4,5]. 


3 Method 


3.1 Sample and Data Collection 


The study was carried out online in the form of a self-report questionnaire. Data were 
collected from May to September 2022 among Ukrainian adult migrants. The study 
obtained approval from the ethics committee of the Institute of Psychology at the Pol- 
ish Academy of Sciences. Participation in the study was completely anonymous and 
voluntary. 221 Ukrainian migrants participated in the study (195 participants had valid 
data in both private and work-related conditions; 196 and 219 respectively), and the 
majority of them were women (72%). 78% of migrant participants entered Poland after 
the 24th of February 2022, after the Russian military invasion of Ukraine. One per- 
son did not want to reveal their gender, and three persons were nonbinary. 56 people 
were interviewed face-to-face and others via the internet through various platforms and 
social media groups to reach as heterogeneous groups of potential participants as pos- 
sible. The average age was 38 (SD = 13, min = 20, max = 72) for Ukrainian women 
and 31 (SD = 13, min = 18, max = 71) for UA men. Ukrainian participants lived before 
migration in a city (48%), 34% in a town, and 16% in a village; after migration 85% 
in a city, 12% in a town, and 3% in a village. Since the study was carried out online 
mainly among volunteers, it was impossible to avoid the selection bias completely. 


3.2 Measures 


Copresence. We used the Polish adaptation of the Social Presence Survey from Bailen- 
son, Blascovich, Beall, and Loomis [2] in a variant by Swidrak, Pochwatko, Matejuk [8] 
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for co-presence measurement. Participants filled the scale twice, separately for work- 
related (variable copresence-work) and private (copresence-priv) communication. Par- 
ticipants responded on a Likert scale from I (I strongly disagree) to 5 (I strongly agree). 
The score of each variable was calculated by extracting the mean of all items. The 
Ukrainian version of the Perceived Copresence Scale (PCS-U) was developed by 3- 
steps translation of instruments procedures. First, the original Polish PCS was trans- 
lated into Ukrainian by two professionals (a psychologist, and a software developer) 
from outside the research team, both of whom were bilingual and fluent in Ukrainian 
and Polish. Next, the two translations were compared, and discrepancies were recon- 
ciled until one working draft in Ukrainian was achieved. Second, a bilingual expert 
panel consisting of two original translators and two science researchers reviewed the 
draft Ukrainian translation to make cultural adaptations as necessary. Third, the cor- 
rected Ukrainian scale was back-translated into Polish by one bilingual translator. The 
back-translated Polish version was then compared to the original Polish version and 
reviewed by the original author to ensure that the questions were translated correctly 
and a cultural and language equivalency was reached. 


Housing Conditions. We controlled whether the participant lived alone or with others 
by asking about the number of adults, children, and small children living in the same 
household. In analyses we used two variables: the number of adults living with the 
participant (variable cohab-adult) and the total number of children (variable cohab- 
kids). We also inquired whether participants had their own room or had to share it 
(variable ownroom, 0 - no, | - yes). Communication conditions were: the number of 
hours spent at work on mediated communication (h-work), the quantity of video-calls 
(video-work), and the speed of the internet connection (internet-speed). 


Hours of Private Calls and Hours of Work-Related Calls. To measure the quantity of 
mediated communication, we asked participants to estimate the number of hours they 
spend daily on private and work-related communication using: a virtual reality head- 
mounted display, a computer, and a tablet/smartphone. In the next step, we summed the 
hours separately for work-related and private calls (variables h-work and h-priv). 


Internet and Electronic Devices. The Internet connection speed was measured with 
a single-choice question: My internet connection is (1) slow, (2) average, (3) fast, (4) 
very fast (variable: internet speed). We also asked about the percentage of calls in both 
contexts being video calls (variables: video-priv and video-work). 


3.3 Data Analysis 


Data analysis was carried out in SPSS version 28. We have used descriptive statistics 
to analyse the demographic data and housing conditions, a principal component anal- 
ysis (PCA) to test the factorial structure of the Ukrainian Perceived Copresence Scale; 
and calculated the Cronbach’s alpha to measure its reliability. We took an explorative 
approach, following Gellman and Hill [2]. To test the external validity of the scale, 
we calculated the correlation between the number of hours spent on work-related and 
private video calls. 
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Table 1. Means and standard deviations of co-presence. 


Variable UA migrants (N = 221) 


Co-presence 


Copresence-priv | 2.21 | 1.10 
Copresence-work 2.05 0.96 


4 Results 


4.1 Descriptive Statistics 


Housing Conditions and Internet Usage. Housing conditions of Ukrainian migrants 
differed: 30% of them were working in an office and going to work each day, only 
8% worked remotely and 10% worked in a hybrid form (partly remote and partly per- 
sonally); 22% were students of various levels and forms of learning, 34% were tem- 
porarily unemployed, home caregivers or pensioners. Only one in ten respondents lived 
alone and 28% had own room; 24% lived with a partner, 38% lived with two or three 
people. Regarding the cohabitants, two thirds of participants lived with children, 21% 
with small children, 41% with one child. Majority of migrant participants (74%) had 
mobile/phone internet connection, only 4.5% used a cable internet connection, and 
2.3% had the optical fibre connection. Only 1.5% of respondents declared no internet at 
home. Overall, the vast majority have a fast or rather fast connection (8% very fast, 44% 
fast, 38% average/medium), with only 9,5% declaring slow connection. Reliability of 
internet connection was rated as relatively low with half of the sample responding that 
it was sometimes interrupted, and was slowing down; 38% that it was often interrupted. 


4.2 Validation of the Ukrainian Perceived Copresence Scale (PCS-U) 


The PCA was used to verify the factor structure of the Ukrainian scale. Bartlett’s test of 
sphericity was significant (x? = 1219 p <.001) and the Kaiser-Meyer-Olkin test score 
was .84. The PCA revealed one factor for the Ukrainian version, explaining 67% of the 
variance. Reliability analysis revealed very high level of internal consistency of PCS-U 
(Cronbach's alpha =.928, valid cases N=195). The correlation between copresence and 
the number of hours spent on work-related and private video calls was .174 (p<.05) and 
.289 (p<.001) respectively (Table 2). 
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Fig. 2. Perceived Copresence Factor Analysis: scree plot. 
Table 2. Perceived Copresence Component Score Coefficient Matrix 
UA PL Component UA | Component PL 
1. B oco6mcromy 1. W sprawach osobistych... .134 .144 
CmJIKyBaHHİ... Byno BiruyTTa, | Czułem się tak, jakby ta osoba 
HiH HA JHONAHA HactpaBAl rzeczywiście była ze mną w 
Oy za 31 MHOIO B KiMHaTi pomieszczeniu 
2. A BiquyB, INO NA JIEOJUHA 2. Czułem, że ta osoba na mnie .152 .147 
NABATLCA Ha MERE i patrzy i jest świadoma mojej 
YCBINOMIJIEOE€ MOIO obecności 
IIpACyTHiCTb. 
3. Byxo BinuyTTa, Hióm expas |3. Czułem się tak, jakby ekran 150 „152 
KOMA IoTepa/rejrehoHy He Sys |komputera/telefon nie stanowiły 
6ap'epoM MIK Hamu bariery między nami 
4. A BirnuyBaB Omnu3bkicTP niei 4. Czułem bliskość tej osoby 150 158 
JuO0 JMU 
5. Y qimopomy cuimKyBanui... | 1. W sprawach zawodowych... | .154 .143 
By ao siguytra, HiÓn nA Czułem się tak, jakby ta osoba 
moma HacipaBxi ya 31 rzeczywiście była ze mną w 
MHOIO B KİMHATİ. pomieszczeniu 
6. A BiquyB, INO NA MIOIMHAaA 2. Czułem, że ta osoba na mnie .167 .163 
JMBUTLCA Ha MEHE i patrzy i jest świadoma mojej 
YCBINOMIJIEIOE€ MOIO obecności 
IIpACyTHiCTb. 
7. Byrmo BiruyTTa, si6u expan |3. Czułem się tak, jakby ekran 157 157 
KOMT IoTepa/rejrehoHy He ÓyB | komputera/telefon nie stanowiły 
6ap'epoM MIK Hamu bariery między nami 
8. Al BinuyBaB Ojrm3PKiCTP niei | 4. Czułem bliskość tej osoby 159 158 


JIIO JMU 


Extraction Method: Principal Component Analysis. 
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5 Discussion 


The first result of the study is the validation of the Ukrainian Perceived Copresence 
Scale (PCS-U) as a one-factor measurement instrument with appropriate internal con- 
sistency. This gives reason to think about the life situation of migration as one that 
changes the conditions of work and personal interactions and reduces the feeling of 
copresence in remote communication, at least during the adaptation period. It means 
that copresence plays different functions during work and private communications in 
conditions of war migration; its pattern analysis is the task of the next study. We need 
to research copresence in private and work communication during the collective trauma 
healing process. Types of trauma events and the character of conversations about events 
and emotional stress may be independent variances for future research. Mass migration 
during the war changed the importance of having a proper physical workspace isolated 
from other cohabitants to participate in work-related calls efficiently. Copresence in the 
context of post-war society transformation, peacebuilding, and construction of histor- 
ical remembering by war migrants, is an important subject for future research for a 
deeper understanding of copresence. There are several limitations to the study, such as 
not being representative of the Ukrainian migrant community sample, including gender 
representation. Future studies of migrants’ technology-mediated communication and 
well-being should focus on research in a more controlled study design. 
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Abstract. The Fourth Industrial Revolution causes changes in the econ- 
omy and shifts the job market demand towards workforce with high 
technical skills. To keep an undisturbed economic growth we have to 
encourage more young people to develop competences in STEAM (sci- 
ence, technology, engineering, arts and mathematics). This has met with 
a response from some education systems which have prepared special pro- 
grammes focusing on developing technical skills among students. One of 
the desired field is robotics, which involves constructing and program- 
ming. We have already conducted some workshops for high school stu- 
dents in this subjects and we would like to find the correct teaching 
tools to attract primary school students. Our idea is to create a modular 
platform, which elements could be used as black boxes, to teach robotics 
to young children. We have noticed that Arduino based kits are a bit 
too complicated and we decided to test a LEGO Technic set equipped 
with an external microcontroller. We have verified the interest level of 
children and the difficulty and time needed for a teacher to master the 
whole teaching platform. According to our study, LEGO attracts stu- 
dents much more than Arduino and is easier in operation and less time 
consuming during classes for teachers. 


Keywords: Autonomous robots * Recognition algorithms - LEGO - 
Robotics - Neural networks - OpenVINO - Microcontrollers - Python 


1 Motivation 


The 21st century brought us the Fourth Industrial Revolution (4IR). This indus- 
trial development is fueled by rapid changes in the digital technologies and indus- 
try, especially in the areas of artificial intelligence and advanced robotics. 4IR 
causes incredible modification of the global production and supply network by 
introducing smart technologies, wider communication between machines, inter- 
net of things (IoT), autonomous process monitoring, self-diagnostic etc. 

The transformation from traditional manufacturing to full automatisation of 
industrial processes totally shifts the role of human in the whole system; stronger 
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dependence on robotics moves us from the main workforce in a factory to aux- 
iliary function of maintenance and emergency service. Undoubtedly, this situa- 
tion has got many advantages like: less strains put on workers body, faster and 
cheaper production, better repeatability of a production process etc. Inevitably, 
the fast pace of drastic changes during the 4IR brings many challenges and dan- 
gers; especially socio-economic risks are raised in [10,11] by some authors. One 
of the problems that arises is lack of qualified workforce that can operate at the 
newly created intelligent workplaces. According to Spoetti and Windelband [14] 
vocational education and training of the workforce will be highly relevant to suc- 
cessful implementation of the 4IR. This burdens the sector of education with an 
expectation to follow the needs of the industrial development and employment, 
what has a strategic importance for a stable and undisturbed economic growth. 

The mentioned challenges affecting schools have been already spotted in some 
countries. In Poland, the Ministry of Education and Science has commissioned 
a study which defines an actual demand for particular professions and which of 
them the economy lacks the most. The published documents from 2021 [12] and 
2022 [13] confirm the statements from the study by Spoetti and Windelband [14]. 
Three professions occurred on the list each year: mechatronics engineer, automa- 
tion technician and robotics technician. In response, the Ministry of Education 
and Science in cooperation with the GovTech Center of the Poland’s Prime Min- 
ister’s office have created an education programme destined for primary schools 
called “The Laboratory of the Future”. The idea is to financially support entities 
that develop students’ competences in STEAM (science, technology, engineering, 
arts and mathematics). New workshops will be created, which should encour- 
age pupils to learn these subjects. A huge focus has been put on robotics, more 
precisely on a controller programming. 


2 Observations 


Our laboratory together with a student research group called “KNIK” from the 
University of Zielona Góra and our partner in the EU’s SpaceRegion project 
- IHP institute from Frankfurt Oder have organized in 2022 some controller 
programming workshops for high and vocational schools’ students. 

We have used Arduino starter kits composed of an Arduino Uno microcon- 
troller board, a breadboard, jumper wires, a power unit, transistors, sensors, 
LEDs, motors, resistors, buttons, an LCD display etc. All in all, it is a quite ver- 
satile but simple to use set to introduce pupils to electronics and programming. 
During some classes also custom made microcontroller boards designed by IHP 
have been used. 

In the previously mentioned “Laboratory of the Future” programme, schools 
are free to select appliances’ models, since only a mandatory equipment type is 
specified. A microcontroller board is one of them. We have assumed that many 
will choose Arduino based sets due to relatively low price, good availability, 
comprehensive documentation and huge community support. 

Not a lot of students have attended the organised workshops. From the des- 
tined group, only about 10% in the high school and about 20% in the vocational 
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school. The Arduino sets have seemed to be a bit too laborious and complicated 
to operate, since for many of them it has been a first contact with a microcon- 
troller programming. Also handling of the accessories with universal connectors 
is a bit different than what students are used to - connecting equipment with 
dedicated plugs. Additionally, they have been a bit older and more skilled than 
primary school students to whom the new ministerial programme is devoted. 
Another problem that can occur in many schools is lack of specialized teach- 
ers. A computer science tutor is skilled in programming but can have inadequate 
knowledge in electronics. Additionally, the time needed to put together a work- 
ing unit based on the Arduino is relatively long, and taking into account very 
complex primary school syllabus, effective utilization of lesson’s time is crucial. 


3 Idea 


In cooperation with our partner IHP we have looked for a solution which would 
appeal more to younger people. One of the most widespread toy types are bricks 
especially LEGO sets. Almost all children are familiar with them and they are 
generally appreciated. 

Another advantage of a LEGO set with a microcontroller is simplicity of con- 
struction process. All parts can be easily and tightly connected, and the amount 
of possible configurations is vast. It has to be underlined that for primary school 
students, the playing aspect is very important, and good entertainment can help 
to stay focused and encourage them to learn. From our observations, quite raw 
Arduino sets, very utilitarian in their nature, are not really rewarding for stu- 
dents. On the contrary, LEGO gives a possibility to make your own construction 
move by adding motors and sensors in the right spots. It gives more freedom in 
model creation and saves lesson time due to relatively simple way of connecting 
parts. Students can be very creative and engaged in building new constructions 
and simultaneously, without a lot of effort, learn how to design control systems 
and program them. 

For our tests, we have chosen a LEGO set featuring electric motors and a 
wheeled platform, see Sect. 4. Additionally, some external electronic parts and a 
control software written in Python have been implemented. We believe that these 
elements are relatively easy to handle, and the learnt skills would be useful at 
the job market. We had decided to give the construction and programming tasks 
to a PhD student inexperienced in such subjects to mimic a teacher’s situation 
and to verify how challenging and time consuming is gaining a new knowledge. 
Young students reaction and interest level have been checked during a public 
science exhibition at IHP institute in Frankfurt. 


4 Construction 


A robot that has been created by the PhD student has been a wheeled platform 
with acamera. He has used some out of the box parts to simplify the development 
process and to make it more reproducible. The construction components are 
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shown in Fig.1. The entire based structure has been assembled with elements 
from the LEGO Technic 42114 Volvo 6 x 6 set (Fig. la [4]). Motion system has 
consisted of four Lego Large Motors 88013 (Fig. 1f [3]) installed on the platform. 
They have been connected to a Raspberry Pi Build HAT (Fig. 1h [7]) which is 
an extension (shield) for the main computer - Raspberry Pi B4 (Fig. 1d [6}). 
An Intel Neural Compute Stick 2 (Fig. 1c [1]) and a Raspberry Pi Camera HD 
v2 (Fig. 1g [8]) have been linked to the main computer. The power has been 
supplied by a LiFePO4 battery (Fig. le [2]) connected to a step-down Voltage 
Inverter LM2596 (Fig. 1b [9]). 
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Fig. 1. Used components. 


Only the compatible components have been selected, thus assembling of the 
entire platform has been relatively straightforward. A first three-wheeled proto- 
type had been constructed within one week. It had been rear-wheel drive, and its 
drivetrain had included a front-mounted self-aligning wheel. Unfortunately, such 
a solution had encountered some mobility problems, so it has been decided to 
transform the entire structure to a four-wheeled vehicle. Still, a simple attach- 
ment of the wheels via LEGO cross axle has caused them to fall out of their 
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mounts during sudden turns. Moreover, a rather low torque and fast movement 
of the motors have made a precise control of the whole structure relatively cum- 
bersome. It has oscillated or on the grippy surface has not moved at all. To solve 
these problems, we have applied a gear multiplier equipped with a dedicated hub 
for each wheel. In addition, we have had to upgrade the entire structure to make 
it more stable and to withstand the forces exerted by relatively heavy battery 
and electronic parts. A final construction is shown in Fig. 2. 
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Fig. 2. Final construction. 


The Raspberry Pi 4B has been the main computer board for this setup. 
It has been extended with the Raspberry Pi Build Hat which is a dedicated 
add-on board (shield) for the Raspberry Pi computer. Generally, this shield is 
compatible with various LEGO motors and can control up to four different units 
or sensors of distance, color, pressure etc. Moreover, it allows users to precisely 
control motors and read data from encoders i.e. an absolute motor position, 
its current speed or rotation direction. Additionally, a motors control via PWM 
signal is possible. A minimal possible rotation angle of used LEGO Large Motors 
88013 has been 6°. 

The Intel Neural Compute Stick 2 (Intel NCS2) is a plug-and-play USB 
hardware deep neural-network inference accelerator for computer vision and deep 
learning. The acceleration is working by assisting the main computer’s processing 
unit (CPU) by taking over mathematical calculations required for running deep 
learning models. Moreover, the Intel NCS2 is supported with an Open VINO 
(Open Visual Inference and Neural Network) library, which includes multiple 
pre-trained models e.g. face or text detection, formula recognition etc. A wide 
variety of them allows users to choose the most suitable for desired tasks. The 
biggest advantage of Open VINO is its application simplicity. This library can be 
used as a ready-made module. A person without specialized knowledge is able 
to apply it without modifications and understanding of its working principles. 
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Inexperienced students can treat it as a black box and quickly create a fully 
functioning device. 

For people and facial recognition task we have used the compatible Rasp- 
berry Pi v2 camera with an 8 MPx resolution. This device works with drivers 
included in Raspbian - Raspberry Pi’s operating system, moreover it has hard- 
ware support what totally limits a consumption of CPU’s computing power. 

The whole system has been powered by LiFePO4 battery connected to the 
step-down Voltage Inverter LM2596. A connection diagram is shown in Fig. 3. 


Raspberry Pi 
BuildHAT 


LiFePO4 
battery 


Voltage 
Inverter 


Raspberry Pi 
4B 


Intel NCS2 


Fig. 3. Connection diagram. 


5 Algorithm’s Working Principles 


The main task of the used algorithm has been recognizing a human posture 
and robot’s movement control. It has had to follow and hold a desired target (a 
human) in the middle of the observed area. 

We used an OpenVINO pretrained model from the Object Detection Models 
library called “person-detecion-retail-0013” [5]. The control algorithm has been 
designed to center the detection frame in the middle of the observed area and 
then try to follow the target and hold it in that position (Fig. 4). The size of 
the detected target has had to be kept within defined limits what has forced the 
robot to zoom in or out by moving forward or backward. 

Software code changes implemented by users can be kept to a minimum. It 
is only necessary to declare the camera resolution, the desired data detection 
frame size and the motors’ speed. The whole algorithm works in real time. Users 
have to bare in mind that a wide accepted model size limit gives better error 
tolerance but can cause some control problems like lack of a trigger to move. On 
the other side, a too small limit can cause permanent movement due to defined, 
finite minimal possible motor’s rotation angle. The software gets caught in an 
infinite loop while trying to center the detection frame because the minimal 
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Directionjof move 


Detection|farea 


Fig. 4. Visualization of the algorithm. 


movement value is bigger than the accepted margin. A similar situation will 
occur if the engines’ speed is too high; the control unit is not able to proceed 
collected data fast enough and to send control commands on time. Also a too 
high video resolution can cause delays between target recognition and motors 
movement due to limited CPU's calculation power. 

Generally, the camera constantly sends data to the computation unit. When 
a target occurs in the detection area and is recognized, the main computer 
transmits a control signal to motors via the Raspberry Pi Build HAT. The motors 
are running until the target's frame is centered in the middle of the detection 
area. The platform rotation is caused by movement of the wheels placed on 
different sides in the opposite direction. Afterwards motors shut down and wait 
until target leaves the center of the detection area. Moreover, the robot tries 
to keep the correct target size by moving forward and backward by rotating all 
wheels in the same direction. The robot follows a detected person to keep him 
within a certain distance. This has been very encouraging for children, who have 
thought that the robot has been scared by them and has run away to keep a 
save distance every time they have come closer. 

All pretrained models are implemented in .xml and .bin files and can be 
used like black-boxes in the modular fashion. Their content do not need to be 
changed by users. E.g. changing our software to recognize particular objects 
instead of people comes down to replacement of the complete .xml and .bin files 
to ones containing the correct detection model. This totally simplifies the soft- 
ware development and can be quickly executed even by inexperienced primary 
school students guided by a teacher who has completed only a short training. 


6 Tests 


Real life tests have been conducted during the Doors Open Day 2022 at IHP 
institute. The robot has been placed at the floor next to a stand. Other experi- 
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ments have been also presented in the same space. The forward-backward oper- 
ation feature has been disabled to prevent the robot from moving away from the 
stand and colliding with someone. 

The reaction of the targeted age group - children aged 7-15 years has been 
very enthusiastic. The bright yellow colour of the robot attracted their atten- 
tion. Smaller children have been mostly interested in playing with the robot - 
moving from side to side and observing how it has followed them. Older primary 
school students generally have wanted to know how the whole thing works and 
what more it could do. We have received questions about used components and 
software. All in all, the presented robot has drawn much more attention and 
vigorous reaction than any of the Arduino sets used during the organized by us 
workshops mentioned before. 


7 Conclusions 


Our observations show that teaching primary school students using LEGO sets 
with additional electronic extensions seems to be a much better idea than intro- 
ducing them to the Arduino. LEGO is well recognised among children, they 
know how to use it and what can be constructed with it. Younger children asso- 
ciate it with good fun and creativity freedom, older ones additionally with com- 
plex mechanics and possibility to control the components movements of LEGO 
Technic sets. Complementing LEGO with external microcontrollers and simple 
artificial intelligence creates a perfect educational set for primary school stu- 
dents. They can learn programming through playing and improve their creativity 
and technical skills by building diverse interactive constructions fulfilling vari- 
ous tasks. These can encourage them to develop competences in STEAM what 
is the aim of the “Laboratory of the Future” educational programme and will 
be desired by the economy changed under the influence of the Fourth Industrial 
Revolution. 
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Abstract. With the advent of new interfaces and modes of interaction 
related to virtual, augmented and mixed reality (VR, AR, MR) or voice 
user interfaces (VUI) a need to explore new approaches to foster rapid 
software prototyping and development emerges. Drawing from our expe- 
riences in human, cooperative, and collaborative aspects of software engi- 
neering, based on IT-empowerment and participatory design with older 
and younger adults, we propose and examine some methods, practices 
and tools for co-designing software for the immersive extended reality 
continuum (XR), including VR, AR or VUIL. In a series of empirical and 
field studies we examined various experimental setups for stakeholders’ 
participation within and across different steps and phases of the soft- 
ware development process for immersive extended reality environments 
(IERE). In this article we provide an overview of selected methods, prac- 
tices and tools, that we found best support communication, collabora- 
tion, and cooperation among stakeholders, especially the members of 
vulnerable groups who are often excluded from the main technological 
discourse and need more empowerment. 


Keywords: Virtual Reality - Participatory Design - Software 
Development 


1 Introduction 


Intensive growth of virtual, augmented and mixed reality solutions opens up 
new possibilities for better immersive experience of software solutions for end 
users. However, the rapid growth of those areas that comprise the immersive 
extended reality continuum (XR), brings new challenges to the software devel- 
opment process and teams. Despite the wide range of existing methods, tools 
and approaches for developing diverse mobile and desktop environments, includ- 
ing user-centered design coupled with end-users participation, i.e. co-design and 
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participatory design approaches, there is still a need to select and tailor the most 
promising solutions to new interfaces and modes of interaction of the XR contin- 
uum. Recently increasing pace of both hardware capabilities and its proliferation 
puts pressure on software developers to deliver more content as more diverse 
groups of users enter the market for Immersive Extended Reality Environments 
(IERE)!. Therefore, the objective of this paper is to present our findings regard- 
ing the methods, practices and tools aimed to facilitate direct involvement of 
users of varying ICT-proficiency levels, particularly from vulnerable groups, in 
the participatory design process for the development of new interfaces and modes 
of interaction related to virtual, augmented and mixed reality (VR, AR, MR). 
In a series of empirical and field studies [1,2,6,9-11, 18,23] we examined various 
experimental setups for stakeholders’ participation within and across different 
steps and phases of the software development processes. Therefore, we provide 
an overview of selected methods, practices and tools, that we found best support 
communication, collaboration, and cooperation among stakeholders, especially 
members of vulnerable groups who are often excluded from the main technolog- 
ical discourse. 


2 Related Work 


Participatory design, or co-design, engages users in the development processes 
[20,21], allowing them to directly shape the designed solutions at their core. 
Users can provide insights at different stages of the process [5,15] - or, according 
to Ladner [14], become designers themselves, in Design for User Empowerment. 
Without this, even solutions meant to work towards Social Good can be tinted by 
stereotypes [24], which can become evident in unexpected places due to uncon- 
scious biases held by the project team, putting into question the ethical aspects 
of such solutions, as some some stereotypes may trickle down into the immer- 
sive environments built. [4] To prevent this, user engagement during the design 
and development is crucial. One way of approximating it is by forming Living 
Labs. The term Living Lab itself was coined by William Mitchell from MIT [16] 
and indicated a space where routines and everyday life interactions of users and 
new technology could be observed and recorded [3,22], to examine the users’ 
needs and avoid business risks [19]. However, besides problems with maintaining 
long-term and sustainable user communities [17] a realistic Living Lab, requires 
significant investments. Some alternatives, like lightweight living labs [9], were 
proposed where users were animated with interesting activities which keeps them 
engaged. This approach is also useful for new and emerging interfaces such as 


1 In this paper we define Immersive Extended Reality Environments (IERE) 
as environments which use new technological solutions running software to enhance 
or replace the experience of reality by appealing to the users’ various senses. What 
these environments have in common is the use of the real life metaphor, thus making 
participatory design rely more on ethnographic studies which evaluate and elaborate 
on the nature of users’ interactions with technologies which have a chance to become 
truly embedded in their daily lives. 
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extended or mixed reality (XR), like VR, AR or voice interaction. There is also 
a lot of room for further research to establish a comprehensive and ubiquitous 
interface that combines all of the extended reality technologies. In this vein, it 
is possible to engage users in a distributed Living Lab environment [11] aimed 
at testing and developing these solutions, and eventually even engage them in 
start-up development teams, as in the SPIRAL method described in one of the 
previous CHASE workshops [9]. Our RAPID approach draws from all of these 
methods, while opening the discussion for further exploration of new and emerg- 
ing interfaces on the extended reality continuum in the context rapid software 
development. 


3 RAPID Approach Outline 


Based on our previous experience with empowering older adults and other vul- 
nerable groups for participatory design, we developed an instant environment 
called RAPID (Rapid and Agile Participatory Interactive Develop- 
ment) devoted to lightweight Living Lab conditions. The RAPID approach 
consists of three phases divided into steps and for each of these we list related 
methods and expected outcomes. 


3.1 I. Preliminary Phase: Team Formation 


This preliminary stage consists of two steps, that can be realized in a few days 
or partially omitted, in particular step 1. in experienced teams or even step 2. 
based on communities such as Living Labs or local activity groups. 


Step 1. Core XR Team Formation and Empowerment. Internal work- 
shops with methods presentation, discussion and roles assignment. This is an 
important step when we prepare the team for direct cooperation with users as 
they may hold some stereotypes about them as well. Internalization of methods 
is important, especially to ensure unbiased and open cooperation with vulner- 
able users and among each other in the in-team role assignment, i.e. product 
owner, team leader, technology advisor, developer, designer etc. 


Methods and Tools that Can be Used: 


— evidence from previous successful projects with direct user participation in 
different contexts, i.e. videos, artifacts (actual applications, products) which 
may come from outside of the ICT-area, for example, from participatory 
Social Design projects, 

— interviews with designers successfully engaging in such projects, sharing the 
benefits and challenges from their points of view 

— ideas about what the team knows about the users, and what is unknown, 
which can later be verified 
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— brainstorming for ideas and insights about the solution with core team mem- 
bers, further facing their stereotypes of the other group (users), 

— affinity diagramming for collection, categorization and analysis of ideas, 

— mind mapping for methods, ideas, team roles and responsibilities. 


Expected Result: Team members understand the importance and goal of the 
process and are open to direct cooperation with end-users, while remaining aware 
of their unconscious biases. 


Step 2. XR Team Extension: User Engagement. Preliminary trials with 
representatives of the group of end users, user engagement, recruitment and selec- 
tion in order to extend the core team (from local communities, groups, Living 
Labs). User engagement session. Preliminary trials with end-users as potential 
team members. Controlled audio-visual immersion into the IERE world. Direct 
interaction (i.e. controllers) optional, not required. 


Methods and Tools that Can be Used: 


— showcasing interaction with various IERE interfaces (Brain-Computer Inter- 
face (BCI), Voice User Interface (VUI) and various types of XR solutions) 

— brainstorming possible everyday uses of such interfaces to invoke creativity 

— engaging users in a simple, largely passive, VR environment (i.e. cardboard 
or video pass-through) 

— advanced XR environment (i.e. full headset or an immersive AR game) 

— fully fledged demos and trials (i.e. fluent and attractive) 

— semi-structured individual and group interviews with users, screening forms. 


Expected Result: Recruitment of two to six end-user representatives with high 
motivation and a creative flair for the main co-design phase session and team 
participation. 


3.2 II. Main Phase: RAPID IERE Development 


Main phase consists of two stages: user empowerment stage (steps 3 and 4) and 
co-design development stage (steps 5, 6 and 7). Each stage can be realized in 
one day, or extended as needed. 


Step 3. Introduction: Discussion of the Goal of the Workshops and 
Current Pre-workshop Expectations of Participants. An informative and 
motivating introduction is key, as both the potential users and the development 
team need to realize the purpose and importance of their work. Additionally, 
the participants ought to introduce themselves, to provide their background 
on technology use, and their experience with the specific topic and their own 
expectations of these workshops. When engaging vulnerable groups it is crucial 
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to not strain their resources and to ensure that the benefit is mutual for the 
parties involved. Next, in order to set baseline expectations, we introduce our 
participants to the idea of the project without putting it in the context of specific 
technologies nor equipment. 


Methods and Tools that Can be Used: 


— ice-breaking games to learn about everyone’s motivations and aspirations 

— division into sub-teams, each with at least one representative of a different 
group (target user, programmer or designer), to facilitate debates and allow 
everyone enough speaking time to voice their opinions 

— semi-structured interviews 

— brainstorming, affinity diagramming, mind mapping 

— co-creating mood and context boards exploring the potential of the project 

— sharing insights accross different sub-teams 


Expected Result: The participants understand the importance and goal of 
the workshops and share their experiences with the subject matter of the work- 
shop (e.g. in our cases: banking technology and ATMs, potential uses of Smart 
Home technology with VUIs and BCI, workplace stress, intrusive thoughts and 
relaxation techniques). 


Step 4. Immersion with Interaction: Hands-On Presentation of the 
IERE Technology. This step is crucial for the target group to get a feel of 
what is possible in the technology of their choice, so that they are not limited 
by their presuppositions or lack of experience in the development stages that 
follow. It may also be beneficial for the target group to witness the development 
team also discovering something new and engaging within the demos. Moreover, 
experiencing immersion may get the users excited about this technology, awak- 
ening their creativity. In this step the target users try IERE gear and engage 
in interactive activities, which to some extent may be similar to the ones being 
developed later, either in scale, means of interaction or goal (Fig. 1). 


Methods and Tools that Can be Used: 


— Showcasing passive commercially available low-cost products that are within 
their reach and their key functions: Cardboard 360 movies and experiences, 
XR experiences, VUI in their own smartphones 

— Engaging the users in active experience creation: capturing 360 photos with 
their own phones, setting functions of their own voice assistants 

— Explaining the idea of the Internet of Things and how experiences can be 
combined and co-dependent 

— Letting all of the participants engage with commercial higher quality games 
and applications available on higher-end devices, which display different and 
more-advanced aspects of IERE and the range of possible interactions with 
the real world and each of them through the Internet of Things 
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Fig. 1. In one of our workshops older adults played the NVIDIA® VR Funhouse game, 
to become aware of the wide range of what is possible in VR 


Expected Result: The participants understand the range of possibilities that 
exist in IERE and get excited about the technology. 


Step 5. Development - Environment in IERE. While working on any type 
of project that involves software engineering many environmental factors come 
into consideration, such as type of devices, software choices or general theme. 
While in our cases most of those factors were predetermined by the concepts of 
the project, hardware solutions available to us at the time and team’s skill, we 
were able to involve all of our participants in the process of designing XR playing 
space and brainstorming on the potential future uses of other IERE solutions, 
while also gaining valuable insights in their real life preferences. 


Concept Name: 


Fig. 2. We printed out VR Sketch Sheets, to be used for web-based XR environment 
design from https://blog.prototypr.io/, however, using them directly with members of 
our target group proved to be too reliant on an unknown metaphor and too detached 
from the familiar experience. 


Methods and Tools that Can be Used: 


— Semi-structured interviews 

— Showing existing large scale projects and their interactions, eg: Google Earth 
VR. Street View in different locations, engaging in unscripted interactions 
with audio assistants 
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— Evaluating existing solutions as design case studies: eg: Rating different 
aspects of the presentation of locations, talking with virtual assistants 

— Presenting the specific case and evaluating it in the same format 

— Using paper prototyping tools to jot down insights: XR Paper Prototyping 
with the use of Sketch Sheets 2)? 

— Creating flow diagrams to depict the discussed interaction methods and 
aspects 


Expected Result: The participants are involved in project defining decisions, 
can visualize and evaluate the suitability of different environments, weigh the 
dangers and benefits associated with each choice, and come up with their own 
ideas regarding environmental variables. 


Step 6. Development of the Concept. Working in IERE environments opens 
a whole new range of possibilities regarding UI and UX, as such interfaces rely 
on approximations of real interactions to a greater extent, therefore it is of 
utmost importance to have as much feedback as possible. By using quick pho- 
toshop mockups, 3D modeling software and interaction and flow mock-ups in 
web-based tools like Proto.io, we were able to gather important insights on what 
our participants expect from our projects, while allowing them to quickly see 
mock-ups of their ideas (Fig. 3). 


Fig. 3. Older adults were shown web-based VR mock-ups from Proto.io on smartphones 


Methods and Tools that Can be Used: 


— Paper prototyping of interface elements and their functionalities, which are 
later turned into quick clickable mock-ups on e.g. Adobe Experience Design 

— Proto.io for UX tests on the initial UI 

— Adobe Photoshop rapid UI prototyping 


2 The Sketch Sheets are available from: https: //blog.prototypr.io/vr-paper- 
prototyping. 
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— Unreal Engine as a quick means of creating simple VR project 
— Autodesk Maya 3D for 3D modeling purposes 


Expected Result: The participants conceptualize and create UI and UX ele- 
ments while having the ability to give immediate feedback to our rough inter- 
pretations. 


Step 7. Development - Functionalities. This step, as one too technical 
for easy and quick implementation, focuses solely on the participants’ ideas 
and insight without prototyping the functionalities there and then. Participants 
explore ideas and conceptualize their implementations and restrictions, to go 
along with the project concept and future and existing assets. It is important 
to give freedom to participants’ ideas and leave out any implementation restric- 
tions. While applicable pre-made sets of interactions can be presented during 
discussion as a starting point, it is not necessary. 


Methods and Tools that Can be Used: 


— Passive interactions with environments in Proto.io, Unreal or Unity 

— Wizard-of-Oz method 

— Brainstorming with mind-mapping the functions in each environment 

— Engaging with other fully functional IERE interfaces to remind of the possible 
range of interactions 

— Affinity diagramming of most commonly suggested concepts 


Expected Result: The participants conceptualize project functionalities, with- 
out software/engine restrictions. 


3.3 III. Closing Phase: XR Product Delivery and Testing 


This phase is from the classic user-centered design approach, so we do not elab- 
orate on it. 


Step 8. Testing with End Users. We would like to point out three important 
groups of methods, i.e. qualitative interviews and observations and quantitative 
sensor-based methods and tools. 


Methods and Tools that Can be Used: 


— Semi-structured qualitative interviews, 
— Hands-on usability tests, 
— Eye-tracking and other sensor-based quantitative methods. 


Expected Result: Verifying the results with extended team members and new 
participants, as well as the extensive network of contacts they all have to gather 
fresh perspectives. Testing is important and can be done outside of the work 
setting, by presenting the demo at a conference or attending fairs, events where 
members of the target group may be present. 
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4 Discussion 


Our RAPID approach was developed and verified in design cases, for example 
a series of design workshops of a virtual simulator of ATM and well-being XR 
environments [12], while its user-empowerment elements were tested in research 
related to Voice User Interfaces (VUI) [13] and Brain Computer Interfaces (BCI) 
[8] for IoT, as well as in a case study of an all round XR-hackathon with several 
dozens of stakeholders from various groups, including software engineers and 
content developers [6,7]. Overall, the RAPID approach is well-suited for creating 
XR solutions with users who need to be empowered, in our case older adults’, 
and we expect this to be true for other vulnerable groups at risk of being viewed 
through the lens of stereotypes or just feeling vulnerable in new circumstances. 


4.1 Phase I: Team Formation 


In the case of participatory design high motivation and engagement are crucial, 
which is why RAPID Phase I is essential for the success of the approach. Here 
are some takeaways from our practice: 


1. One good practice is to have the team discuss the most promising 
target user candidates, in terms of their involvement and the ease of col- 
laboration. As this approach is fast, it is important to gather people who 
enjoy working together, are interested in the project and want to express 
themselves. On one hand this may seem counter-intuitive, as we give voice 
to the target group members who are outspoken already, but on the other 
hand, they do belong to the target group, and they are more likely to be 
early-adapters, so we in a way, we do design for them when we work on the 
early version of the solution. 

2. We found that granting the potential target groups access to technology, 
which may otherwise be out of their reach (either financially, or because of 
the setup) is a very good way to guarantee engagement, and usually it is 
enough to convince them to participate in the development process. 

3. Vulnerable users, who are shown technologies they had no experience with 
before, need assistance from dedicated staff who encourage them and ensure 
the first impressions are smooth. 


4.2 Phase II Development 


After the users excluded from the main technological discourse join the develop- 
ment team, empowerment is even more important. They are now among people 
who are tech-saavy, driven and may dominate the discussion. Some takeaways 
from our studies indicate that: 


3 The groups needing empowerment may be not as obvious, as in different contexts we 
enter different relationships where knowledge or power are not equally distributed. 
This is the case with students vs teachers, employers vs employees but also clients 
and contractors or even scientists in interdisciplinary teams. 
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1. One strong need of our participants was to clearly understand the tech- 
nology and the nature of the development process, as well as the end 
goal of the project and each of the involved parties’ stake in it. For example, 
in XR workshops it was reflected in the preferred use of familiar terminology, 
such as “simulated” instead of “virtual”. 

2. Taming the technology: surprisingly every participant had almost no issues 
with getting used to VR headset and its controllers. The participants treated 
both stationary and room scale VR experiences as a novelty. They unani- 
mously agreed that room scale VR felt more impressive; however, some par- 
ticipants concluded that cardboards are better suited for beginner users. The 
same was true of using VUIs, but in this case some participants decided this 
solution could work for other members of the target group, but not for them 
- showing some prevailing intergroup stereotypes to be aware of, even 
when engaging in participatory design. 

3. Giving a place to start: The discussion inspired by Google Earth VR Street 
View in 3D was valuable, as it gave the participants space to imagine other 
interactive aspects of the possible solution, without the need to prototype 
it first. The users explored potential scenarios for the ATM use, as well as 
locations, pointing to specific places and recalling different situations, as well 
as mentioning their own insecurities about them. 

4. Drawing and prototyping: It is not necessary for the target group to 
draw and conduct prototyping by themselves, as designers may do this for 
them following instructions and descriptions they provide. Overall, Design for 
Empowerment, where users become designers, does not have to mean that 
they have to become craftspeople as well. For XR setting, paper prototyping 
did not seem to work for us, as the metaphor of changing 2D depictions into 
3D ones was too detached from the target project for users lacking experience, 
therefore we found low-fidelity prototyping to be better suited for generating 
insights. 

5. Not being shy about half-finished products: In our ATM case, thanks to 
the participants’ interaction with 3D mock-ups we gathered multiple insights 
concerning both the build in terms of product design of the ATM as well as 
its UI, discussed in the context of existing ATMs. We have also gathered ideas 
regarding preferred relaxation environments, including all immersive aspects 
such as sounds, colors, movement and avatar placement and their realism, 
down to the wind movement and animal species. 


4.3 Phase III Closing 


The UX tests were conducted both by the users and the development team in 
order to maintain the fresh approach and view on the different aspects of the 
ATM and its usability, controls, environment and the final purpose as a training 
simulation and the well-being environment (Fig. 4). 
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Fig. 4. Example of immersion verification with eye-tracking and psychophysiology 


4.4 Other Considerations 


Scheduling. RAPID is designed to be concise but scalable. This means, that 
it can extended or compressed to just five days, as follows: 


— Day 1 I. Preliminary Phase - team assembly and empowerment; 
Day 2 I. Preliminary Phase - user recruitment and engagement; 

— Day 3 II. Main Phase - user empowerment; 

Day 4 II. Main Phase - RAPID prototyping; 

— Day 5 III. Closing Phase - preliminary product delivery and testing. 


| 


Moreover taking into account the typical iterative nature of agile approach 
and rapid prototyping, some steps can be omitted i.e. core team empowerment 
or user recruitment, based on previous cycles of software development process 
or previous projects. Below we present a possible schedule of implementation of 
the RAPID approach as a sprint. 

It is vital to schedule the meetings with enough time for questions and digres- 
sions since the whole team and the users alike need to feel their presence and 
insights are of importance - to an extent even if they relate to the issues outside 
of the main goal of the workshop. While they may not be crucial for the end 
product, these questions are essential for the process that relies on open sharing. 
Often such workshops run long, so ordering catering is another good practice. 


Venue. The RAPID approach will work best if the venue changes, depending 
on the focus of each phase. 


— I. Preliminary Phase: while Stage 1 with the team can be conducted any- 
where, it is advisable to get out of the usual place of work to allow the team 
to think out of the box in a more creative setting, which does not resemble 
the regular work setup. Stage 2 on the other hand should be done in an open 
social environment, such as a fair, a community center or any other open 
venue in which potential participants may feel empowered to take the novel 
gear on a trial run. 

— II. Main Phase: This phase can happen in any flexible workshop space with 
the possibility to create a round-table to facilitate open discussion 

— III. Closing Phase: Ideally the venues in this case would be environments 
similar to where the end product is expected to be used. 
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4.5 General Discussion 


Despite the fact that many methods and tools were developed in recent years, 
there is still a room for discussion to propose a better, more comprehensive app- 
roach for the software development teams to address the challenges of constantly 
changing hardware landscape of IERE. For instance, many of use cases described 
at premier software engineering and HCI conferences were based on the rapid 
prototyping using cardboard platforms, thus became more or less obsolete or 
hard to implement, as support for these weakened or ceased. On the other hand, 
there are also some shortcomings of paper prototyping that are not easy to over- 
come, because new interfaces and modalities are not just a simple extension of 
the WIMP paradigm (windows, icons, menus, pointers). In contrast, methods 
for prototyping 2D interfaces are well-established in both software engineering 
and human-computer interaction with plethora of effective tools, including low 
fidelity mock-ups that can be prepared even by the end-users themselves. 

Thus, drawing from our research experience, we concluded that if even users 
who are generally excluded from the main technological discourse can creatively 
and competently engage in such 2D prototyping activities with a proper empow- 
erment. On the other hand, during our research on software engineering for 
advanced and multi modal IERE interfaces we observed that unlike prototyping 
mobile or web-based flat environments and applications there is a gap between 
proficient software designers, developers and end-users. This gap is evident in the 
field of paper prototyping of immersive interactive environments, which consti- 
tutes an important barrier to direct cooperation with end-users, content design- 
ers and software engineers. Even empowered users have significant difficulties in 
bridging the gap between low-fidelity paper prototypes and hi-fidelity immersive 
multidimensional interaction. This situation resembles the case of architectural 
design, where drawings and schemes are not enough for many end users to fully 
understand and imagine the final solutions, hence the industry practice of 3D 
visualisation of models and spaces, more recently with the use of VR-based solu- 
tions. In our case, the same metaphor applies, where prototyping for IERE has to 
“hit closer to home” to enable the users to fully understand the functionalities of 
end products and to be able to contribute feedback and relevant insights. These 
low-fidelity functioning prototypes are generated based on insights from the ear- 
lier steps in the development cycle, which rely heavily on the empowerment of 
users, consisting of demonstrations, free use of technology and discussions with 
the team, unhindered by fears and false preconceptions, and open to constructive 
criticism. 


5 Conclusions 


In this paper we outlined the RAPID approach with example methods and 
tools based on a series of case studies. It is an attempt to discuss and explore 
new approaches to rapid software prototyping and development relying on user- 
empowerment of users typically excluded from the main technological discourse 
and facing unconscious biases in the context of immersive extended reality envi- 
ronments (IERE). The proposed approach allowed us to gain valuable insights on 
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the development process of Immersive Extended Reality Environments, includ- 
ing such aspects as UX, UI, 3D models, web-based mock-ups and current and 
future functionalities, but it remains to be seen whether the software produced 
with this approach perform reliably better and are more likely to be used by 
these groups. At the same time we are excited about the prospects of the discus- 
sions and follow-up studies, further testing the limits of such sprints, and refining 
such IERE development methods to facilitate the creation of great experiences 
directly with users excluded from the main technological discourse, empowered 
to share their opinions, needs and aspirations. 
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Abstract. An HTTP cookie (hereinafter cookie) is a piece of information that 
maintains a state in a stateless HTTP protocol. Recently established privacy and 
security regulation imposes an obligation on the service provider to obtain the 
user’s consent to use cookies. This paper is aimed at studying usability guidelines 
of requests for consent to use cookies (hereinafter consent requests). The consent 
requests have usability issues that make it difficult for the users to choose the right 
privacy and security options. A study of privacy and security regulation is aimed at 
extracting design requirements. Manipulative designs also known as dark patterns 
are explored and applied to assess consent requests of two of the most popular 
Lithuanian news portals. An evaluation revealed the presence of dark patterns in 
consent requests as well as violation of privacy and security requirements. As a 
result, usability guidelines on the design of cookie consent requests are developed. 


Keywords: HTTP cookies - cookie consent requests - dark patterns - security - 
privacy 


1 Introduction 


An HTTP cookie is a small piece of information passed between a web server and a 
browser [1]. This is a collection of information the server creates when a user visits a 
website [2]. The information stored in the cookies helps the website to identify the user 
or restore the options set by the user — it can be login, the pages visited by the user on 
the website, or other usability options, such as language and font size. 

Cookies are used to check whether two received requests were sent from the same 
web browser and to remember the state of the website, for example, stored information 
about what goods are placed in an e-shop cart. Although the purpose of cookies is to 
track the state of the web page which would be lost when the user leaves the domain, 
more detailed purposes are often distinguished [3]: (a) strictly necessary cookies for the 
smooth functioning of the website; (b) preferences cookies, such as language, font size, 
login name, and password; (c) statistics cookies collect information about the user’s 
behavior on the website, such as links the user visited; (d) marketing cookies track user 
activity on the Internet to help deliver personalized advertising. 

Depending on the collected content, cookies may endanger users’ privacy [4]. Users’ 
devices and their information are recognized as personal space in the European Union 
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(hereinafter EU) law. According to EU law, to process users’ personal data, service 
providers must inform them about the methods of data processing and obtain their consent 
[5]. Implementing the law, cookie consent requests are introduced on the websites. 
However, the consent requests require additional mental effort that occurs unexpectedly 
when opening the page, thus usability issues arise. 

The design of consent requests usually does not help the users understand what they 
are agreeing to. Even more, designers use knowledge of user behavior (e.g., psychology, 
A/B testing) to generate as many consents as possible when designing consent requests. 
Such misleading designs are called dark patterns [6]. They are also widely used in social 
blogs, online stores, mobile apps, and even computer games [7]. Consent requests include 
various dark patterns, such as information hiding, manipulation of element positions, 
formatting, blocking of the website content, the abundance of choices, hard-to-see text 
or links to additional settings, and pre-marked choices [8]. This manipulation degrades 
the user experience of using the website. 

This paper aims to formulate usability guidelines facilitating the design of consent 
requests that meet privacy and security requirements, are easier to understand, and are 
less annoying to users. For this purpose, the next chapter explores essential privacy and 
security requirements. The third section examines dark patterns found in cookie consent 
requests. Further, the evaluation of consent requests of the most popular Lithuanian 
news portals based on revealed requirements and dark patterns is conducted. Finally, the 
usability guidelines facilitating the design of more usable cookie consent requests are 
formulated. 


2 Privacy and Security Regulation 


The laws regulating the processing of personal data in the European Union are defined 
in the General Data Protection Regulation (hereinafter referred to as GDPR). In this 
regulation, personal data is defined as any information relating to a person whose identity 
can be or has already been identified [9]. According to the EU directive on privacy 
and electronic communications supplementing it (hereinafter referred to as E-Privacy 
Directive) [3], service providers must obtain the user’s consent to use cookies that are 
not necessary for the provision of the service [10]. To obtain consent, service providers 
use requests on their websites that are accepted when the user first visits the website. 
To process personal data, service providers have to receive the consent of the person 
concerned. The GDPR sets out the rules for consent-based data processing, some of 
which relate to the usability of the consent requests [11]: 


design must ensure that the users understand what they are consenting to; 
consent must be given freely and provided in clear and understandable language; 
consent is not considered freely given if the person does not have a free choice or 
cannot refuse consent; 

e the individual must be able to refuse consent as easily as it is to give it [5, 8]. 


According to the E-Privacy Directive, cookies may be used provided that users are 
clearly and accurately informed about the purposes for which they are used [11]. If 
cookies are not necessary for the functionality provided, such as tracking cookies for 
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market research, it may be necessary to obtain user consent [12]. Users must be informed 
about what information is provided to the device they use. The user must also be able to 
opt-out of cookies when other users have access to personal information stored on the 
device. Information about the purposes and subsequent uses of cookies must be provided 
together with information about the user’s right to refuse cookies before they are used. If 
the user decides to refuse cookies, the service provider must still provide the minimum 
services — for example, the website with restricted content. 

To process user data, at least one of the several conditions for the processing of 
personal data provided for by EU laws must be met (e.g. obtaining the consent of 
the individual). If service providers seek to obtain a person’s consent, it should meet 
certain requirements and recommendations. Table I summarizes the requirements and 
recommendations that consent should be sought. 


Table 1. Privacy and security requirements for cookie consent requests 


Identifier | Condition Description 


R1 Given free will Consent must be freely given. The GDPR recommends that 
consent should not be considered freely given if the 
individual has no other choice or option to opt-out 


R2 Concrete Consent must be expressed by clear, affirmative action 


R3 Informed The user must be provided with accurate information about 
the purposes of data processing and future uses in clear and 
understandable language 


R4 Unambiguous Consent must be expressed for each purpose for which user 
data will be processed 

R5 Clear Consent requests must be separated from other matters 

R6 Continuous If consent is requested electronically, the request should be 


clear, and concise and not unnecessarily interrupt the use of 
the service 


R7 Easy disagreeing Opting out of cookies should be as easy as accepting them 


R8 The right to disagree Information about the user’s right to refuse the use of 
cookies must be provided together with information about 
the purposes of using cookies 


3 Dark Patterns in Cookies Consent Requests 


Harry Brignull defined dark patterns as a carefully crafted user interface to trick users 
into doing things they might not otherwise do [13]. Dark patterns are increasingly found 
in different digital platforms such as social blogs and online stores [7] as well as on the 
official websites of major companies such as Facebook, Amazon, LinkedIn, and Uber 
[14-16]. One of the dark patterns — “Privacy Zuckering” — was named after the Facebook 
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founder due to the difficulties of managing privacy on Facebook. Other dark patterns 
include trick questions, sneak into the basket, Roach motel, price comparison prevention, 
hidden costs, bait and switch, confirmshaming, disguised ads, forced continuity, and 
friend spam [15]. Gray et al. categorized Brignull’s identified dark patterns into five 
types: nagging, obstruction, sneaking, interface interference, and forced actions [16]. A 
study by Luguri and Strahilevitz revealed that the more aggressively dark patterns are 
applied, the more likely users are to succumb to manipulation and make choices that are 
against their interests [17]. Even seemingly small design decisions, such as highlighting 
the consent option or moving the decline button to another window, can significantly 
increase the likelihood of obtaining user consent [5]. 

Dark patterns are often observed in cookie consent requests. A study on the top 1,000 
EU websites revealed that 57.4% of requests use at least one dark pattern in 2019 [18]. 
The following design solutions containing dark patterns are met in consent requests [8]: 


e Consent walls are pop-up windows that contain part of the service provider's privacy 
policy or informative text about the use of cookies on the website. They clearly 
separate consent request from other matters, thus satisfying the GDPR requirement 
(R5). However, consent walls block access to the website until users express their 
consent. This can be seen as not freely given consent (R1 unsatisfied). Thus, the dark 
pattern of forced action can be assigned. The dark pattern of nagging is also observed 
as a consent request is displayed immediately after visiting the website. 

e Tracking walls are a type of consent walls, in which the user can only accept the use 
of cookies or leave the site. This design uses the dark pattern of forced action which 
is more aggressively applied than on consent walls. The tracking wall is a sort of 
barrier that prevents the user from taking the desired action. Therefore, this design is 
also a case of the obstruction dark pattern. 

e Reduced service is provided when the users do not accept the privacy settings. If no 
alternative is provided to access the full content (for example, a paid service option), 
the user is forced to agree with the use of cookies to reach the full functionality of the 
website. This situation can be described by the dark pattern of forced action. Also, 
redirecting a user to a version of a site with restricted content or functionality could 
be regarded as nagging. 

e Manipulation of configurations encourages users to accept the use of cookies. Visual 
manipulations such as different sizes, position, formatting (use of different colors 
and fonts), an abundance of options, and pre-marked options aim to make one option 
more noticeable, and more attractive than the other and secretly encourage the user 
to choose it. Such manipulation is characterized by the dark pattern of interface 
interference. 


Dark patterns exploit the users’ limited attention because they often multitask while 
browsing the Internet. Designers use attention diversion techniques in request configu- 
rations, which draw the user’s attention to one part of the website to divert it from other 
parts that would be more useful to the user [19]. A study conducted by Utz and others 
shows that such design decisions as moving the position of the request from the top to 
the bottom of the screen or highlighting the consent button influence the choice of users 
to express consent or disagreement with the use of cookies. The study observed that 


Improving the Usability of Requests for Consent to Use Cookies 195 


the probability of consent to the use of cookies increased from 0.16% to 83.55% when 
pre-tagged options were used in the request [18]. 


4 Evaluation of Consent Requests in Lithuanian News Portals 


Evaluation of the design of cookie consent requests was conducted on the two most 
visited news portals in Lithuania: Delfi.lt and Lrytas.lt [20]. In the assessed requests, 
consent was expressed using the “I agree” button in the main request window (R2 is 
met) (see Figs. 1 and 2). However, an option to disagree was not presented in the first 
window of any request (R7 is violated), therefore disagreement requires more clicks 
than expressing consent. Each request allows the user to change specific goal settings 


separately (R4 is met) and the consent request was clearly separated from other questions 
(R5 is met). 


THIS WEBSITE USES COOKIES 


Fig. 1. The first window of cookie consent request on Delfi.lt (https://www.delfi.lt/en/ — the 
English version of Delfi.lt) 


Delfi.lt consent request is designed in notification bar style and does not interrupt the 
use of the service (R7 is met). Lrytas.lt uses a consent wall-type request, it is displayed 
immediately after opening the website which is a case of nagging. However, its sole 
purpose is to inform the users about the use of cookies and to obtain their consent, so it 
can be said that it is not an unnecessary interruption (R6 is met). 


Irytas.lt 


Gerbiame jūsų privatumą 


| : ee 
Fig. 2. The first window of cookie consent request on Lrytas.lt (https://www.lrytas.lt/english — the 
English version of Lrytas.lt). Although the English version is activated, the request is presented 
in Lithuanian. Below two options are presented: “More options” — on the white button and “I 
agree” — on the red one. 


Although information about the user’s right to disagree was provided only by Lry- 
tas.It in the first request window, this right was indicated in both examined sites (R8 is 
met). Lrytas.lt provided an option in the main window that leads to the cookie setting 
window, so the user can understand that consent is not the only option. In the Delfi.lt 
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consent request, the cookie settings are hidden under the less visible “More information” 
dropdown. The user may not even find the hidden settings without carefully examining 
the request and think that the only option is to accept the setting of cookies (Fig. 3). This 
raises doubts about whether the consent received is freely given (R1 is violated). The 
same doubts are raised by Lrytas.lt developers decision to block the site’s content and 
secretly encourage users’ consent by highlighting the “I agree” button. 


lagree 


dia = | 


See tne partners we work with below, Expand each one to see how they process your Gata. YOU can object to legitimate interest processing per vendor. 
THIRD PARTY VENDORS ACCEPT ALL | REJECT A. 


m :Tappx a 
Privacy policy: https://werw.tappx.com/en/privacy-policy! 


Fig. 3. Expanding “More Information” on Delfi.lt. It is hard to notice that the rejection of all 
options is available, because of the small text on the right (indicated by a yellow arrow). The user 
cannot decline individual options. 


Lrytas.lt provides the users with detailed information about the data collected and the 
purposes of processing — for each partner it is indicated what information is collected, 
the expiration date of the cookie is presented and the goals are clearly stated (R3 is met) 
(Fig. 4). 


lrytas.lt 


Gerbiame jūsų privatumą 
Mes ir mūsų partneriai saugo ir (arba) pasiekia įrenginyje esančią informaciją, tokią kaip slapukai, ir apdoroja 
asmens duomenis, tokius kaip unikalūs identifikatoriai ir standartiné įrenginio siunčiama informacija, skirta toliau 
nurodytiems tikslams. Galite spustelėti, kad sutinkate, jog mes ir mūsų partneriai tvarkytų jūsų asmens duomenis 
Sials tikslais. Arba, priešingai, galite spustelėti, kad nesutinkate, arba gauti iSsamesnés informacijos ir pakeisti 
savo nuostatas prieš sutikdami. Jūsų nuostatos bus taikomos tik Siai interneto svetainei, Atkreipkite démesj, kad 
tvarkant kai kuriuos jūsų asmens duomenis jūsų sutikimas néra būtinas, tačiau turite teisę nesutikti, kad duomenys 
batų tvarkomi. Galite pakeisti savo nuostatas bet kuriuo metu grize | šią svetainę arba apsilankę puslapyje, 
kuriame pateikiama mūsų privatumo politika. 

PA 1 


Tikslūs geografinės vietos duomenys ir identifikavimas skenuojant įrenginį IŠJUNGTI > 


Suasmeninti skelbimai ir turinys, skelbimų ir jy turinio vertinimas, auditorijos j2valgos Ir jg 
produktų kūrimas IŠJUNGTI > 2 


Laikyti ir (arba) gauti informaciją įrenginyje IŠJUNGTI > 


PARTNERIAI TEISTAS INTERESAS | smmm | 


Fig. 4. Presentation of the categories of collected data. Label 1 indicates “Accept all” and “Reject 
all” options. Label 2 shows an option to decline individual categories. This implicitly means that 
all options by default are enabled. 
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Part of Delfi.lt information related to partners is not translated from English, which 
may not be understandable to all Lithuanian users (R3 is violated). 

The four dark pattern categories were found on the assessed sites. Dark patterns of 
nagging, obstruction, forced action, and interface interference are observed in all sites 
examined. The obstruction pattern makes it difficult to reject cookies when such an option 
is not provided in the first request window. The dark pattern of interface interference is 
observed on Delfi.lt where users are not informed about the user right to disagree with 
the use of cookies in the first request window. Moreover, Lrytas.lt tries to hide from 
users that some options are enabled by default. The dark pattern of interface interference 
manifests itself in the consent option that is presented in all requests more attractively 
than the option that directs to cookie settings. Delfi.lt used a less noticeable drop-down 
element for cookie settings. The dark pattern of forced action is observed only at Lrytas.lt 
in which the content is blocked until the users express their choice. 

Allin all, both examined consent requests have not complied with at least two privacy 
and security requirements. Four dark patterns were detected on Lrytas.lt, three — on 
Delfi.lt. 


5 Design Guidelines for Cookie Consent Requests 


Revealed violations of privacy and security requirements highlight the need for easily 
applicable guidelines that would support developers in ensuring the privacy and security 
of consent requests as well as following the GDPR and avoiding dark patterns. Based 
on the revealed defects, the guidelines for ensuring privacy and security requirements 
have been formulated (Table 2). 

As an example of applying guidelines, the main page of Delfi.lt cookie consent 
request is redesigned (Fig. 5) by adding the disagree option (G3). Both options require 
clicking the button which is a clear affirmative action (G1). The information that the 
user can refuse to use cookies is shown on the first page (G7). 


SIOJE SVETAINEJE NAUDOJAMI SLAPUKAI 


Naudojame slapukus, kad galétume suasmeninti turinj bei skelbimus, teiksi visuomeninés medijos ee 


funkcijas ir analizuoti srautą. Be to, svetainės naudojimo informaciją bendriname su viesuomeninés 
medijos, reklamavimo ir analizės partneriais, kurie gali ją pridėti prie kitos jūsų pateiktos arba naudojant 
paslaugas surinktos informacijos. 


Galite peržiūrėti ir rinktis, kas ir kokiu tikslu naudoja Jūsų duomenis išskleidę “Slapuky nustatymai”. 


Slapuky nustatymai v 


Fig. 5. The redesigned main page of Delfi.lt consent request (in Lithuanian): both options (agree 
and disagree) are presented as equivalent choices to meet the guidelines G1, G3, and G7. 


The example of the application of other guidelines is based on the redesign of the 
Lrytas.lt site (Fig. 6). Guideline G13 requires facilitating the reading of the lengthy 
texts. In user interface design this is usually achieved by designing a proper visual 
hierarchy. Another solution is introducing associative icons. Both solutions are involved 
in the examined page: icons facilitate quick scanning; the summary of each purpose is 
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provided in boldface. Further details of each purpose can be obtained by expanding these 
options. The options to accept or reject all options facilitate the user’s decision to refuse 
more options (G8). Each changeable option indicates its status, whether it is enabled or 
disabled (G5). More sophisticated design decisions were made to support the quicker 
perception of texts. A relation between the collected data and their processing purpose 
(G9, G10) is visualized using graphs and color codes. This solution was popularized by 
the pribot.org tool for a showing of a privacy policy on Twitter. 


Table 2. Usability guidelines for consent requests 


Identifier | Description 


G1 Consent must be expressed by clear affirmative action, such as clicking a button or checking a 
checkbox 

G2 To ensure that consent is given freely, consent and tracking walls should not be used to obtain 
consent 

G3 Disagreeing with the use of cookies should be as simple as accepting their use. The “Disagree” 


option has the same importance as the “Agree” option, thus both options must require the same 
number of mouse clicks 


G4 If cookies are used for several purposes, consent should be requested for each purpose 
separately 

G5 Tf the request uses elements whose state can be changed (e.g., disable/enable, agree/disagree), 
the setting must have a clear indication of the state (i.e. whether it is enabled/disabled or 
agreed/disagreed) 

G6 The default settings can only be used to activate strictly necessary cookies 

G7 The users must be informed about their right to refuse the use of cookies in the first request 
window 

G8 Tf the consent request requires the user to express an opt-out or objection to more than one 


legitimate interest, data processing, or partner purpose, an option to facilitate changing the 
settings of many elements must additionally be provided, such as to reject all listed objectives 


G9 The user must be provided with accurate information about the purposes of data processing, 
data collected, cookies used, and their possible uses in the future, including how cookies can be 
used and how long they will be valid 


G10 If a query contains a lot of information in one place, it should be presented a in visual structure 
highlighting important points 


G11 The user must be informed if the data is shared with third parties 


G12 The request must include a link to the website’s privacy policy, where the user can find more 
information about data processing and storage 


G13 Getting to know the purposes of data processing should take acceptable time 


G14 The design has to ensure that conditions are created for making an informed decision. The 
elements used in the request must not distract the users’ attention or encourage them to make 
specific decisions 


G15 User-relevant information should not be hidden under hard-to-see elements or on additional 
request pages. If due to the design of the request, it is not possible to present all the information 
on one page, the request should provide a link where users can find the full text of the 
information 
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Gerbiame jūsų privatumą 


Mes ir mūsų partneriai saugo ir (arba) pasiekia įrenginyje esančią informaciją, tokią kaip slapukai, ir apdoroja asmens duomenis, 
tokius kaip unikalūs identifikatoriai ir standartinė jrenginio siunčiama informacija, skirta toliau nurodytiems tikslams. 


Atkreipkite dómesj, kad tvarkant kai kuriuos jūsų asmens duomenis jūsų sutikim: 


ku je pateikiama misy priv: atumo politika 


1) 


iksliy geografinés vietos duomeny naudojimas ir identifikavimas skenuojant jrenginj > 
Suasmeninti skelbimai ir turinys, skelbimų ir jų turinio vertinimas, auditorijos jžvalgos > 
r produktų kūrimas 
gB Laikyti ir (arba) gauti informaciją įrenginyje > 
Renkami duomenys 4 Tikslas 4 


|---— 


Turinio pateikimas | 


SANTRAUKA KONKRETUS TIKSLAI IR FUNKCIJOS PARTNERIAI TEISETAS INTERESAS | resmmsem | 


Fig. 6. Redesign of data processing purposes (in Lithuanian): 1) the text of processing purpose is 
augmented by icons to facilitate quicker content perception (G13); 2) an option to accept or reject 
all options is provided (G8); 3) all elements whose state can be changed have a clear indication 
of the state, in the example — disabled (G5); 4) the chart relates the purposes (right edge) with a 
processed data (left edge), mouseover highlights selected options (G9, G10). 


6 Conclusions 


This paper aims to establish usability guidelines that would support the usability of 
consent requests. Recently established regulation for the processing of personal data 
requires service providers to obtain the user's informed consent to use cookies that are 
needed for the provision of the service. Implementing this law, each website offers a 
consent form that contains all the required information. These forms interrupt the current 
activity and require users to focus on reading a complex legal text. The very fact of the 
interruption is annoying. The very fact of the interruption is annoying. The unusable 
design adds frustration. Therefore, many users decide to minimize the disturbance by 
choosing the quickest solution that, unsurprisingly, consciously or not, is beneficial for 
service suppliers. However, unethical behavior will not go unnoticed, thus damaging 
the users’ perception of the provider’s brand value. As all sites are required to obtain 
consent, the sites with more usable solutions will benefit. 

The study of GDPR revealed the requirements for ensuring the privacy and security 
of processing personal user data in web services. These requirements relate usability 
of consent requests design. Since providers are interested in a specific user choice, the 
existence of dark patterns in consent requests was checked. 


200 K. Lapin and L. Volungevičiūtė 


Investigation into dark patterns revealed the presence of four types of dark patterns 
in the examined consent requests. It was found that the first screen in both examples 
hides the possibility to reject or choose appropriate cookies on a website. The texts are 
provided in the form of text walls that hinder their readability. The consent wall prevents 
the usage of the service until the user will accept with consent request. So, the user’s 
free will is questionable. 

Although service providers have an interest to push desirable behavior by accepting 
all cookies, care for brand value requires considering the users’ interests, too. Developed 
usability guidelines are aimed at facilitating the design of consent requests in future 
designs. The redesign examples illustrate the usefulness of the developed guidelines. 
Most of the developed guidelines simply suggest the design decision, such as providing 
a button that could be clicked or including both options on the first screen. Others are 
formulated more abstractly. For example, facilitating readability or avoiding distraction. 
The latter guidelines are subject to further investigation to provide more specific design 
solutions that would be ready to apply. 


References 


1. Kristol, D.M.: HTTP Cookies: standards, privacy, and politics. ACM Trans. Internet Technol. 
1, 151-198 (2001). https://doi.org/10.1145/502152.502153 
2. Using HTTP cookies — HTTP—MDN. https://developer.mozilla.org/en-US/docs/Web/ 
HTTP/Cookies. Accessed 28 Sept 2022 
3. Koch, R.: Cookies, the GDPR, and the ePrivacy Directive. https://gdpr.eu/cookies/. Accessed 
28 Sept 2022 
4. Hamed, A., Kaffel-Ben Ayed, H., Kaafar, M.A., Kharraz, A.: Evaluation of third party tracking 
on the web. In: 8th International Conference for Internet Technology and Secured Transactions 
(ICITST-2013), pp. 471—477 (2013). https://doi.org/10.1109/ICITST.2013.6750244 
5. Nouwens, M., Liccardi, I., Veale, M., Karger, D., Kagal, L.: Dark patterns after the GDPR: 
scraping consent pop-ups and demonstrating their influence. In: Proceedings of the 2020 CHI 
Conference on Human Factors in Computing Systems, pp. 1-13. Association for Computing 
Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3313831.3376321 
6. Brignull, H.: Bringing Dark Patterns to Light. https://harrybr.medium.com/bringing-dark-pat 
terns-to-light-d86f24224ebf. Accessed 28 Sept 2022 
7. Mathur, A., et al.: Dark patterns at scale: findings from a crawl of 11K shopping websites. 
Proc. ACM Hum.-Comput. Interact. 3, 81:1-81:32 (2019). https://doi.org/10.1145/3359183 
8. Gray, C.M., Santos, C., Bielova, N., Toth, M., Clifford, D.: Dark patterns and the legal 
requirements of consent banners: an interaction criticism perspective. arXiv:2009.10194 [cs] 
(2021). https://doi.org/10.1145/3411764.3445779 
9. Data protection under GDPR. https://europa.eu/youreurope/business/dealing-with-custom 
ers/data-protection/data-protection-gdpr/index_en.htm. Accessed 28 Sept 2022 
10. Santos, C., Bielova, N., Matte, C.: Are cookie banners indeed compliant with the law? Deci- 
phering EU legal requirements on consent and technical means to verify compliance of cookie 
banners. http://arxiv.org/abs/1912.07144 (2020). https://doi.org/10.48550/arXiv. 1912.07144 
11. General Data Protection Regulation (2016). https://eur-lex.europa.eu/legal-content/EN/TXT/ 
PDF/?uri=CELEX:02016R0679-20160504&from=LT 
12. Buthelezi, M.P., Loock, M.: User online privacy and identity management behaviors: a com- 
parative study. In: 2014 Annual Global Online Conference on Information and Computer 
Technology, pp. 53-57 (2014). https://doi.org/10.1109/GOCICT.2014.14 


13. 


14. 


15. 
16. 


17. 


18. 


19. 


20. 


Improving the Usability of Requests for Consent to Use Cookies 201 


Brignull, H.: Dark Patterns: inside the interfaces designed to trick you. https://www.the 
verge.com/2013/8/29/4640308/dark-patterns-inside-the-interfaces-designed-to-trick-you. 
Accessed 29 Sept 2022 

Schlosser, D.: LinkedIn Dark Patterns. https://medium.com/@ danrschlosser/linkedin-dark- 
patterns-3ae726fe1462. Accessed 29 Sept 2022 

Brignull, H.: Dark Patterns. https://www.darkpatterns.org/index.html. Accessed 11 Feb 2021 
Gray, C.M., Kou, Y., Battles, B., Hoggatt, J., Toombs, A.L.: The dark (patterns) side of 
UX design. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing 
Systems, pp. 1-14. Association for Computing Machinery, New York, NY, USA (2018). 
https://doi.org/10.1145/3173574.3174108 

Luguri, J., Strahilevitz, L.: Shining a Light on Dark Patterns. Social Science Research 
Network, Rochester, NY (2019). https://doi.org/10.2139/ssrn.343 1205 

Utz, C., Degeling, M., Fahl, S., Schaub, F., Holz, T.: (Un)informed consent: studying GDPR 
consent notices in the field. In: Proceedings of the 2019 ACM SIGSAC Conference on Com- 
puter and Communications Security, pp. 973-990. Association for Computing Machinery, 
New York, NY, USA (2019). https://doi.org/10.1145/3319535.3354212 

Chatellier, R., Delcroix, G., Hary, E., Girard-Chanudet, C.: Shaping choices in the digi- 
tal world. From dark patterns to data protection: the influence of UX/UI design on user 
empowerment. Technical report, CNIL (2019) 

gemiusAudience: September overview of the most popular Lithuanian websites (in Lithua- 
nian). http://www.gemius lt/interneto-ziniasklaidos-naujienos/gemiusaudience-rugsejo-men 
esio-apzvalga-641 1.html. Accessed 12 Oct 2022 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 


The images or other third party material in this chapter are included in the chapter’s Creative 


Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter's Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


D) 


Check for 
updates 


Transdisciplinary Approach to Virtual 
Narratives - Towards Reliable Measurement 
Methods 


Grzegorz Pochwatko!®9 ©, Daniel Cnotkowski? ©, Paweł Kobyliński ©, 
Paulina Borkiewicz?®, Michał Pabiś-Orzeszyna*©, Mariusz Wierzbowski? ©, 
and Laura Osęka! © 


' Institute of Psychology, Polish Academy of Sciences, Warsaw, Poland 
gpepsych.pan.pl 
2 Laboratory of Interactive Technologies, National Information Processing Institute, 
Warsaw, Poland 
daniel.cnotkowskieopi.org.pl 
3 Visual Narratives Laboratory vnLab, Lodz Film School, Łódź, Poland 
4 Film and Audiovisual Media Department, University of Lodz, Łódź, Poland 


Abstract. We have recently observed intense growth in the film industry's inter- 
est in VR creations. Cinematic VR artists encounter challenges that result from 
discrepancies between established techniques of storytelling, stylistic conven- 
tions, and organizational culture indicative of traditional modes of film practice 
and the requirements of the new medium and new audience. We propose a trans- 
disciplinary approach to cinematic VR research. Thanks to the cooperation of art 
& science - a collaboration between psychologists, information technology spe- 
cialists, film scholars, and filmmakers will contribute to the emergence of a new 
VR narrative paradigm. We use a number of quantitative and qualitative methods 
to study the perception of cinematic VR works, an illusion of spatial presence and 
copresence, attention, emotions, and arousal of its users, narrative understanding, 
and character engagement. We measure participants” reactions in many indepen- 
dent ways: in addition to subjective assessments and declarative methods, we 
use more objective data: eye tracking, multi-point position skeleton tracking, and 
psychophysiological responses. We show the effectiveness of the adopted app- 
roach by studying three artistic cinematic VR works: narrative and non-narrative, 
live-action, and animated. We compare the user experience and present the pos- 
sibilities of interpretation and feedback benefits for art. 


Keywords: cinematic VR * research app * usability * presence 


1 Introduction: Motivation and Related Work 


Any VR experience allows the subjects to choose their paths of visual attention 
[4,5] and spatial behavior [11,12, 14,15], which leads to different emotional reactions 
[7,9, 16]. VR designers that employ any system of attention cues and plan to manipulate 
emotional arousal in immersive storytelling, may be interested in measuring the effec- 
tiveness of their actions. Tools are emerging, but so far, they are limited (e.g., based 
on the eye, and head movements only [13]). Our transdisciplinary multi-dimensional 
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approach addresses the problem of reliable measurements of the audience response to 
cinematic VR experiences. 

The number of virtual training and research environments in which participants are 
free to explore limited or (nearly) unlimited space increases [1,8, 18,19]. Analysis of 
participants’ movement, physiological responses and declarations allows inferences on 
the impact of the experience, approaching or avoiding objects, being relaxed or stressed, 
the response to virtual agents (passive or active characters). 

Cinematic VR experiences (CVR) are limited compared to computer-generated 3D 
environments because they do not allow users to walk freely or interact with objects 
and characters. CVR is typically experienced in a linear story, but the use of spherical 
projection and central placement of the participant can make the experience different 
for each viewer, depending on the creator’s intention. CVR can be more immersive than 
computer-generated environments due to its higher level of photorealism, which can 
lead to greater spatial presence and co-presence illusions. Some CVR experiences 
can elicit intense arousal in viewers due to the inclusion of primary distress signals, 
such as sudden loud noises or changes in lighting [3, 10]. Increased arousal may not be 
related to the content of the narrative. However, high tonic activity (TA) can make it 
easier to elicit a response to important parts of the story. It can also make it difficult 
if the creator puts too many primal stimuli and TA reaches the ceiling level. Situations 
in which the story evokes arousal are much more interesting. These can be e.g., the 
appearance of characters, interaction between them, and music. Reliable measurement 
enables us to register, e.g., increased heart rate or skin conductance level as a response 
to objects and characters. 

Successful storytelling in CVR depends on participants being placed in the right 
direction at critical moments, giving them the physical ability to follow what is visible 
and audible and respond to it in a predicted way. 

Art creators and filmmakers are looking for a new VR narrative paradigm, and 
want to do it systematically. VR technology that made it possible to shoot stereoscopic 
360° videos and create interactive experiences also allows us to analyze participants’ 
reactions to the virtual content. The use of this information depends on close, transdis- 
ciplinary collaboration between art & science. 


2 Overview of the Cinematic VR Research Method 


2.1 The Research Application 


The research application was created using the Unity engine, which is the core of the 
method and enables the collection of behavioral data and communication with external 
devices. However, the transdisciplinary approach to CVR research also involves the use 
of a variety of other methods to provide additional insights into viewer reactions to 
CVR and ensure the reliability of the research. 


2.2 Screening 


Before inviting participants to the lab, we ensure that there are no health issues or med- 
ications that could interfere with psychophysiological measurements (PF), and that par- 
ticipants have not consumed any psychoactive substances, alcohol, or coffee. We also 
verify that participants’ vision is corrected if necessary. 
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2.3 Measures Used Before VR Experience 


Before watching the experience, participants complete questionnaires to assess psycho- 
logical variables, individual differences, attitudes, and traits that may impact their per- 
ception of the experience or be affected by it. The specific methods used in these ques- 
tionnaires may be tailored to the content of the experience or used consistently across 
all experiences, such as the mood scale and the Self-Assessment Manikin (SAM). [2,6]. 
We measure temperamental traits, which are relatively stable human characteristics, to 
better understand participants’ motor behavior and emotional reactions. We also assess 
participants’ mood and emotional state to determine the positive or negative impact of 
the experience on them. 


2.4 Baseline Measures 


Research with the use of PF and behavioral measures require a baseline measurement 
in neutral conditions. 


Calibration: 1) PF measures (e.g., EDA, ECG, EEG! if applicable) - require external 
equipment, and trained personnel; 2) eye-tracker - standard procedure from the manu- 
facturer; 3) body motion and position (head, arms, legs, torso and waist). Participants 
perform a series of standard movements (head and whole body rotations) - providing 
data for the correction of motor artifacts in the PF signal. 


Baseline Sequence: 1) 3 min in a quiet, minimalist environment, eyes open’; 2) 360 
neutral and affective videos - reference to PF response. Videos by Li et al. [7]: Aban- 
doned City, valence 3.33/9, arousal 3.33/9, duration 50s; Spangler Lawn, v. 5.9/9, a. 
3.27/9, d. 58 s; Seagulls, v. 6/9, a. 1.6/9, d. 60s; Walk Among Giants, v. 5.79/9, a. 2/9, 
d. 60s; Tahiti surf, v. 7.1/9, a. 4.8/9, d. 60 s. 


2.5 Cinematic VR Experience Test 


After completing the baseline, the target CVR experience begins. Participants start 
watching from a fixed position and are free to move around (3DoF) and explore the 
experience (6DoF). Both body and eye movements are recorded. Thanks to markers, it 
is possible to time synchronize the PF signal recorded with external equipment. 


2.6 Measures Applied After VR Experience 


Participants complete a series of questionnaires. Presence and co-presence, repeated 
measure of mood and SAM, user experience and declarative evaluation of the baseline 
and target CVR is measured. Questionnaires specific for the tested VR experiences are 
introduced. An in-depth interview regarding the content and impressions related to the 
experience is conducted. 


' EDA - electordermal activity, ECG - electrocardiogram, EEG - electroencephalogram. 
? provides comparative data for PF analyzes of reactions to stimuli; (for EEG recording add a 
3-minute measurement with eyes closed). 
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2.7 Digital Markers Calibration 


Synchronization with external devices (e.g. BIOPAC, NEUROSCAN) requires the abil- 
ity to send digital triggers via LPT or USB TTL adapter. Unity allows us to send 
time/event related and conditional markers with millisecond precision, which is impor- 
tant when analyzing subtle changes in PF signal. We use 8-bit markers (1-255). Record- 
ing frequency of 1000 Hz or 2000 Hz enables precise reading of the marker. A test is 
performed before each series of studies. A series of markers in 1-1000 ms intervals is 
sent and registered on an external device with 20000 Hz resolution. The marker times 
are compared and corrected if necessary. 


3 Current Research - Method 


3.1 Participants 


247 people participated in the study (157 females, 88 males, 2 persons did not indicate 
gender or chose a different answer), mean age 31.65 SD = 9.32. Half of the participants 
were involved in creative activities, while the other half were recipients of art (interested 
in, e.g., cinema or art exhibitions - screened to rule out the distracting influence of low 
motivation) (Fig. 1). 


Fig. 1. One of the participants of the cinematic VR experience study. 


The study involved volunteers recruited through online ads. Screening was per- 
formed to exclude people who had contraindications to participate in VR experiments 
(e.g. problems with stereoscopic vision) or used substances that could influence PF 
measurements (e.g. anti-arrhythmic drugs). 


3.2 Materials and Apparatus 


Measures. Due to length restrictions we limit the scope of this paper and analyze only 
a part of the data. The full list of measurements includes: 1. questionnaires related to 
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the CVR productions and individual differences that may affect reception of them, 2. 
mood and well-being repeated measure, 3. PF data (eye movements and fixations, pulse, 
EDA), 4. post-VR questionnaire that included: 


— User experience experimental setting and equipment scale (from | definitely not to 
5 definitely yes) 
e instructions: The instruction given by the experimenter was clear and under- 
standable 
lab.comfort, .hot, .tooSmall, e.g., The room where the simulation took place 
made me feel comfortable, 
appar.comfort The apparatus used during the study was comfortable, 
HMD.heavy The HMD was too heavy, 
HMD.hot Jt was too hot with the HMD on, 
HMD.unhyg The HMD looked unhygienic, 
Bothering.wires Wires connected to HMD bothered me, 
Bothering.noVisual (...) after putting on HMD, I couldn 't see the experimenter, 
Bothering.observed It bothered me that other people were watching me, 
afraid.fall After putting on the HMD, I was afraid that I might bump into some- 
thing or fall over, 
afraid.damage The equipment used for the training was too delicate, I was afraid 
that I might damage something. 
— User state - Self-Assessment Manikin (SAM) [2,6] 
e valence (positive - negative), 
e arousal (high - low), 
e dominance/control (low - high); 
— Short Presence Scale (SPS) created by the authors, based on the MEC-SPQ [17] 
(e.g., I felt as if I were taking part in events, not just watching them): Spatial Presence 
- Self Location (SPSL), four items; Spatial Presence - Possible Actions (SPPA), four 
items; Suspension of Disbelief (SoD), two items; Attention Engagement (AE), two 
items. 


Hardware. 64bit PC Intel Xeon I7, RAM: 32 GB, GTX 1080TI (Win10), a Neuroscan 
device, 6 HTC ViveTrackers 2.0, Vive Pro Eye HMD, accessories: elastic wrist, ankle, 
belt, chest straps with tracker mounts and replacement facial interface from VRCover 
with single use hygienic covers. 


Experimental Application. Experiment application was prepared in Unity Engine. It 
consisted of a single scene environment, with a simple cubical room as a neutral start- 
ing point for the stimulus. Important features were: playback of hi-res 360 videos with 
ambisonic audio, sending event markers using parallel port and gathering user posi- 
tional and eye tracking data with high resolution. Video playback was solved by usage 
of Unity Store asset designed for uninterrupted 4K video playback with stereoscopic 
360 video support. To solve ambisonic audio playback we have used a Facebook 360 
spatial decoder, synchronising its playback internally with video. This approach com- 
bined with splitting audio and video streams into two separate files, proved itself suffi- 
cient to meet aforementioned assumptions. For parallel port communication an external 
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Fig. 2. Diagram of sources of gathered data. 


library was used. To achieve best precision of event markers delivery we have devel- 
oped solution allowing us to prepare event markers in advance and separate task of 
timing and sending those to parallel port output into a separate thread, with help of C# 
built in Stopwatch class millisecond precision of the markers was achieved. Highest 
resolution of data gathering possible was limited by Unity to 90 Hz as the core of the 
application is constrained by the application framerate (consistent with HMD’s fram- 
erate), which proved to be sufficient. Data we gathered was obtained as follows: from 
within scene - positional and rotational data of each connected VR device and acces- 
sory, using OpenVR API we also gathered positional data, while additionally collecting 
velocity and pose matrix data of those devices, by using SRanipal API we gathered eye 
tracking data consisting of combined eye, left and right eye gaze data and current eye 
focus point. Outgoing event markers were recorded allowing us to sync with PF data 
(Fig. 2). 


3.3 Procedure 


Six HTC Vive Trackers were attached at ankles, wrists, belt and chest areas with special 
anti-slip straps. Participant was seated in a chair where electrodes and sensors were 
applied. Then, the HMD was put on, electrode readouts were checked, and per-user eye 
tracker calibration was performed. After setup all lights in the experiment room were 
turned off, baseline and target registration followed under supervision. 
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Table 1. Global scores and Personal characteristics 


Variables range | Global Gender M Gender F Art recepient | Art creator 
Mean | SD | Mean SD | Mean SD | Mean SD | Mean SD 

SAM:sad/happy 1-5 |3.83 |0.91 | 3.80 0.94 | 3.84 0.90 | 3.79 0.87 | 3.88 0.96 
SAM:calm/aroused 1-5 |1.98 | 1.04 | 1.93 1.01 | 2.01 1.06 | 2.03 1.08 | 1.92 1.00 
SAM:controlled/in control | 1-5 |3.66 | 1.25 | 3.71 1.11 | 3.64 1.32 | 3.58 1.35 | 3.75 1.14 
Instructions 1-5 |4.93 |0.32 | 4.92 0.35 | 4.93 0.30 | 4.92 0.33 | 4.93 0.31 
Lab.comfort 1-5 | 4.76 | 0.56 | 4.72 0.56 | 4.78 0.56 | 4.78 0.60 | 4.74 0.53 
Lab.hot 1-5 | 1.46 | 0.99 | 1.53 1.03 | 1.43 0.98 | 1.47 1.01 | 1.46 0.99 
Lab.tooSmall 1-5 | 1.25 | 0.74 | 1.30 0.75 | 1.22 0.74 | 1.25 0.73 | 1.25 0.75 
Appar.comfort 1-5 | 3.96 | 1.14 | 4.03 1.10 | 3.91 1.16 | 3.96 1.20 | 3.94 1.08 
HMD.heavy 1-5 | 2.76 | 1.36 | ***2.34 | 1.32 | ***3.00 | 1.33 | 2.71 1.42 | 2.82 1.31 
HMD.hot 1-5 |1.81 |1.08 | 1.70 1.07 | 1.88 1.08 | 1.80 1.09 | 1.83 1.07 
HMD.unhyg 1-5 |120 |0.56 | 1.21 0.59 | 1.20 0.55 | 1.19 0.59 | 1.22 0.54 
Bothering.wires 1-5 |175 |1.07 | 1.84 1.12 | 1.69 1.02 | 1.67 0.97 | 1.84 1.15 
Bothering.noVisual 1-5 |133 |0.81 | 1.30 0.75 | 1.34 0.85 | 1.40 0.91 | 1.25 0.71 
Bothering.observed 1-5 |127 |0.67 | 1.20 0.61 | 1.32 0.70 | +1.35  |0.76 | +1.2 0.55 
Afraid.fall 1-5 |158 | 1.01 | 1.56 0.97 | 1.58 1.02 | 1.62 1.12 | 1.54 0.90 
Afraid.damage 1-5 |150 |0.90 | 1.57 0.88 | 1.46 0.91 | 1.52 0.96 | 1.49 0.84 
SPSL 1-7 |421 |1.40 | 4.20 1.33 | 4.22 1.45 | **4.43 | 1.42 | **3.99 | 1.37 
SPPA 1-7 |3.97 | 1.58 | **3.62 |1.54 | **4.16 1.59 | **4.24 | 1.61 | **3.69 | 1.53 
SPAI 1-7 |5.98 |1.17 | 6.02 1.05 | 5.94 1.24 | 6.00 1.12 | 5.95 1.23 
SPSD 1-7 |3.70 |1.73 | 3.58 1.76 | 3.75 1.72 | ***4.07 | 1.72 | ***3.33 | 1.68 


Significance level: *.05, **.01, ***.001, +marginally significant. 


4 Results 


4.1 User Experience - Experimental Setting and Equipment Evaluation 


The overall user experience rating was very good. The pattern of the results of individual 
aspects was as predicted - see Table | “Global”: 1) The instructions provided by the 
experimenter and in the app were clear and understandable; 2) Lab comfort was rated 
very high, which is consistent with temperature and lab size ratings; 3) Apparatus was 
comfortable, although rated lower than lab in general. HMD - appropriate weight and 
temperature, hygienic (important in times of epidemic threat); 4) Neither equipment 
nor experimental situation bothered the participants. Participants were not bothered 
by the fact that they did not have visual contact with the experimenter after wearing 
HMD, or that someone could observe them during the VR experience; 5) Participants 
were not afraid to fall or bump into something. They also did not see the equipment 
as fragile and easy to damage. 6) Differences; Women considered HMD significantly 
heavier than men. Art recipients were more concerned about being watched than the 
art creators. Surprisingly we noticed that some aspects of comfort differed between 
narrative vs. non-narrative and live-action vs. animated conditions. 
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Table 2. Contents characteristics 


Variables range | Narrative Non-narrative | Live-action Animated 
Mean SD | Mean SD | Mean SD | Mean SD 

SAM:sad/happy 1-5 | 3.88 0.91 3.74 0.91 | ***3.62 | 0.94 | ***4,24 | 0.72 
SAM:calm/aroused 1-5 |1.05 1.03 | 2.05 1.07 | ***2.15 | 1.10 | ***1.66 | 0.84 
SAM:controlled/in control | 1-5 | ***3.86 | 1.17 | ***3.25 | 1.31 | ***3.45 | 1.32 | ***4.06 | 0.98 
Instructions 1-5 4.92 0.36 4.95 0.22 | 4.94 0.31 | 4.91 0.33 
Lab.comfort 1-5 4.80 0.55 | 4.67 0.57 | 4.76 0.55 | 4.76 0.59 
Lab.hot 1-5 |**1.33 |0.87 | **1.74 |1.17 | 1.46 1.01 | 1.47 0.96 
Lab.tooSmall 1-5 | 1.30 0.81 | 1.14 0.55 | **1.12 | 0.52 | **1.48 | 1.00 
Appar.comfort 1-5 | 3.99 1.12 | 3.88 1.20 | 3.86 1.15 | 4.14 1.10 
HMD.heavy 1-5 | 2.78 1.37 | 2.73 1.36 | ***3.01 | 1.36 | ***2.31 | 1.25 
HMD.hot 1-5 | ***1.64 | 0.99 | ***2.16 | 1.17 | 1.89 1.09 | 1.65 1.03 
HMD.unhyg 1-5 | 1.19 0.57 | 1.24 0.56 | 1.17 0.48 | 1.26 0.69 
Bothering.wires 1-5 | 1.69 0.98 | 1.89 1.22 | ***1.91 | 1.15 | ***1.46 | 0.83 
Bothering.noVisual 1-5 1.30 0.80 1.39 0.85 | 1.32 0.79 | 1.33 0.86 
Bothering.observed 1-5 | 1.28 0.65 | 1.26 0.71 | 1.26 0.68 | 1.29 0.65 
Afraid.fall 1-5 | *1.67 1.03 | *1.39 0.95 | ***1.42 | 0.94 | ***1.88 | 1.07 
Afraid.damage 1-5 1.45 0.83 | 1.61 1.03 | **1.60 | 0.98 | **1.31 | 0.67 
SPSL 1-7 4.12 1.45 | 4.41 1.27 | **4.03 | 1.52 | **4.57 | 1.06 
SPPA 1-7 | 3.99 1.65 | 3.93 1.45 | **3.76 | 1.59 | **4.37 | 1.51 
SPAI 1-7 5.93 1.16 | 6.08 1.20 | ***5.78 | 1.30 | ***6.34 | 0.76 
SPSD 1-7 | 3.73 1.77 | 3.64 1.66 | **3.48 | 1.63 | **4.13 | 1.83 


Significance level: *.05, **.01, ***.001, +marginally significant. 


4.2 User State - Emotion, Arousal and Control 


A key ethical issue was to ensure that participants were in a good emotional state upon 
completion of the study. This goal was achieved: participants felt rather happy, calm and 
in control. As expected, there were no differences in SAM scale (Table 1). Differences 
appeared between types of content (Table 2). As expected, participants felt more in 
control in narrative compared to non-narrative condition. They also felt more happy, 
aroused and in control in the animated compared to live-action condition. 


4.3 Presence 


Global level of presence was moderate (SPSL and SPPA, see Table 1 “Global”). SPPA 
were rated lower than SPSL. There are individual differences; men rated SPPA below 
the midpoint of the scale, whereas women rated it higher, above the midpoint of the 
scale. More differences were found between art creators and recipients: Art creators 
rated SPSL&PA significantly lower than recipients. They scored also lower in SPSD. 
Ratings in all presence sub-scales were lower in live-action compared to animated 
condition (Table 2). 
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5 Discussion 


We presented scientific tools for researching virtual narratives - 360 3D CVR. Our meth- 
ods are comfortable and safe for participants. They rate high conditions in the lab and 
the equipment used (in terms of e.g. comfort and hygiene). There are also no significant 
distractions from equipment or procedure (comp. SAM and presence scales). After the 
study, participants feel happy, calm and in control. The level of self-location is slightly 
above average, indicating illusion of presence. The level of possible actions is lower, 
because CVR is not interactive. This may lower the overall presence level, as CVR 
photorealism may trigger an expectation of interaction. Another reason could be exper- 
imental situation: participation in the study may trigger participants’ urge to carefully 
analyze the presented materials. Attention involvement is very high and suspension of 
disbelief low. This may cause the participants to notice all the flaws of the cinematic 
experience while being distracted by the internal and external stimuli. This is in line 
with the differences observed between art creators and recipients. Creators (being more 
critical) have lower scores on the presence subscales except for attention involvement. 
They analyse the experience more, and therefore they have lower suspension of disbelief 
and presence. Differences between animated and live-action experiences may be due to 
the contents of the experiences. Live-action VR is very realistic, whereas animated VR 
looks artificial - expectations differ. 

The unexpected differences between narrative vs non-narrative can be explained by 
artifacts that possibly appeared in between-subjects design. The lab and HMD tem- 
perature ratings differed, which coincided with the execution of one of the conditions 
(experiences were tested right after they were produced), they could cause systematic 
differences in ratings. It needs further investigation, as there is no theoretical reason for 
them to differ this way (especially as there were no differences in other factors, e.g. 
gender). 

There are differences in self assessment of emotional state between contents con- 
ditions, but not in art-related and gender groups. This further supports the explanation 
coming from the differences in the tested narratives which match the more global clas- 
sification of contents groups. A further investigation is needed with a more comparable 
material - for example an experience with the same content but a different form: live- 
action vs animated. It is worth noting that despite differences, the participants in all 
groups are satisfied, calm and in control after completing the study. 


6 Conclusions 


In terms of user experience and predictability of results, we have created and presented 
an effective method of CVR research. It can provide an insight into the perception of 
CVR experiences that can be used by art creators to better understand their audience 
and improve their means of expression. The obtained data is also attractive for various 
fields of science. This gives hope for closer cooperation between art&science and the 
improvement of both CVR productions and tools for studying them. 
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Abstract. Despite the spread of augmented reality (AR) systems and its appli- 
cations onto a number of various areas, the adoption of AR in industrial context 
is relatively limited. We decided to conduct an exploratory user study to define 
the eventual singularities that might be associated with the barriers for HMD AR 
technology adoption in the industrial settings, as recent works presented potential 
benefits of its applications with regard to specific 3D measurement data inter- 
pretation. The task-based study was designed to engage users with interaction of 
volumetric data of static and time series nature. We compared actions of users 
performed in lab vs. in situ conditions simulating real, process tomography mea- 
surement data visualisations for granular bulk solids flow in large containers. Study 
results revealed concrete directions for further work that might eventually enable 
wider adoption of HMD AR systems in the industrial context in terms of specific 
gestural interaction and visualisation techniques development. 


Keywords: augmented reality - gestural interaction - measurement data 
visualization - industrial data analytics 


1 Introduction 


Technological progress significantly influences the development of systems supporting 
access to information and data visualization. Increasing the access paths to the sense of 
sight, and changing perception of the real environment, by providing additional infor- 
mation generated by computer systems against the background of the natural world, 
is crucial for applying augmented reality (AR) in different domains of life [1-3]. The 
study of the ways of communicating and interacting with augmented reality elements 
is an essential subject of research undertaken on many levels and fields of application 
of AR. The use of this technology is changing the way we work. AR fulfills the func- 
tion of supporting activities performed physically by providing the user with a visual 
perception beyond the real world [4, 5]. AR allows you to enrich the real world with 
additional content, such as video or image, which allows the user to perform activities 
simultaneously in the real and digital world. Special attention is given to the use of AR 
in industrial applications [6]. 
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There is a potential to further enrich the experience and usability of the AR Head- 
mounted devices (HMD AR) in the industrial context with gestural interaction and 
input. This would be especially helpful for monitoring and maintaining the reach of 
ongoing processes in measurement data, where HMD AR brings the extra value of 
freeing hands from the need to constantly hold the devices, a requirement for smaller 
AR-enabled equipment such as tablets. Gesture interaction with hands-free devices, like 
touch screen surfaces, is widely explored in literature [7]. However, as the understanding 
how the context of use influences gesture interaction with 3D data is still challenging, 
we designed an experiment conducted in both lab conditions as well as in situ with real 
industrial flow rig settings. This exploratory study investigates how users would interact 
with the measurement data of a 3D nature with the aid of HoloLens HMD in different 
settings. 


2 Related Work 


The interaction issue with AR systems has been well investigated. From this point of 
view, we can analyse users in terms of interaction with objects/data visible in augmented 
reality. Most researchers focus only on visualizing objects/data and developing better 
methods for gesture recognition. However, interaction with the augmented reality world 
also includes interaction with AR objects or datasets and directly influencing their form 
to analyse information hidden in objects or datasets. 


2.1 Gestural Interaction and Data Visualization in AR Systems 


The comparison of interaction in AR systems, between hand gesture-based interaction 
and multi-touch interaction, in terms of visual contexts, shows the advantage of hand 
gestures. [8]. Hand gesture interaction is faster than multi-touch interaction in regard to 
task completion time. There have been studies conducted to determine which gestures are 
the most intuitive for users [9]. Performing movements such as scaling, moving, deleting 
and approving are often used during user studies. However, the common approach is to 
show directly what the user should achieve and then they must make a move by which 
they want to achieve a specific goal. Additionally, gestural interaction is investigated on 
a general, universal object without special purposes; what causes tasks to be dedicated 
to object manipulation without a wider, determined aim. Such gesture interaction can 
cause a lack of understanding of the full interaction of users with the problem posed 
to solve and limit natural user interactions with AR data. Another study worth noting 
is the research on stock exchange data visualization and its use in AR [10]. The 3D 
representation of financial data with hand gesture interaction was only evaluated in the 
possibility of data analysis regarding limited time and fulfilling tasks. 

An interesting gesture study involved the manipulation of different scale objects, 
rotating a house, and rearranging its rooms [11]. Authors explored how the scale of AR 
affects the gestures people expect to use to interact with 3D holograms. It was shown that 
one or two hands gestures were applied depending on the manipulated object size. In the 
case of large objects, the participants used both hands and, in the case of small objects, 
they did it with two fingers. The tasks were not complex and consisted of a sequence of 
separate gestures. The objects and work with objects were not analysed, only gestures. 
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2.2 Augmented Reality in an Industrial Setting 


The direct application of the AR system in industry is widely researched, yet seldom 
implemented. The possibility of using popular modern AR systems based on mobile 
devices such as smartphones/tablets and smart glasses (Apple ARKit, Google ARCore, 
and Microsoft HoloLens) in an industrial context was investigated in terms of local- 
isation quality in a large industry area [12]. The impact of using AR systems during 
device assembly instead of using a paper manual is also widely examined [13-15]. The 
most important limitation to be noted is the effort involved in creating a manual in AR 
compared to a paper one and considering the possibility of a serious mistake. Aug- 
mented Reality has also been tested in monitoring industrial flow processes. It has found 
application in drug diagnosis and simulations [16] or in-situ analysis and monitoring of 
measurement data analysis and monitoring [17]. As shown [18], most AR applications 
in the industry involve assembly processes by providing instructions to users on how to 
perform scheduled activities. These include remote assistance, improved user safety in 
industry space, or industrial process inspection & monitoring on site making. Some of the 
technologies involved relate to different sizes of displays (primarily tablets), projected 
AR views, and HMD use. 


3 Experimental Study Description 


The main goal of the study is to reveal what kind of gestures participants will use when 
conducting 3D data analysis in augmented reality and if there will be any differences 
depending on the context by recruited (n = 20) participants. The prototype AR app was 
based on 3D data model visualisation supporting baseline performance and enabling 
basic manipulations helpful for fundamental tasks performed with the data in normal 
conditions [19]. The chosen datasets were the electrical capacitance tomography (ECT) 
type [20, 21] for the gravitational flow in the silo-discharging process. Figure | presents 
both in-lab, in situ experimental space as well as types of projected AR visuals that can 
be treated as two types of interface alignments [22]. The observations and interviews 
were conducted during 20 experiment sessions. 15 sessions were conducted within lab 
conditions -- an empty classroom space. The remaining 5 took place in situ, at the 
semi-industrial tomography flow measurement lab. Each session started with a brief 
introduction to the tomography system and image interpretation in the context of the 
process. 

The participants were required to perform 4 tasks (as described in Table 1.), with 
no time constraints. They were encouraged to speak out loud about what they wanted 
to do and how, which allowed researchers to gather more data by making notes on their 
comments. Afterwards, semi-structured interviews were recorded, and the observations 
were archived. 
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Fig. 1. Visualisations of application used in laboratory (A & C) experimental conditions. A: in situ 
silo flow 3D AR projection; B: Horizontal slice of time series flow topogram; C: in lab tomography 
visualisations projected as free AR holograms. (Source: Personal collection) 


Table 1. Short descriptions of the four tasks prepared for each participant to complete. 


Task name | Description 


TO Involved verifying the user’s understanding of what is visible on the 3D data 
model. The participant was to determine the state of the process and show 
understanding of visuals’ colour spatial distribution in terms of low or high value 
of the flowing material concentration 


T1 The participants were given an opportunity to propose their own gesture for the 
interaction with the 3D model (moving and rotating the cylinder) and then to use 
the implemented gesture 


T2 The 3D image data had to be divided into separate portions, which allowed 
participants to look inside the 3D image of the silo space 

T3 The users were to use all the gestures they had learned earlier and add new ones to 
show the researcher how they would present and explain the silo flow process 
based on the 3D data 


4 Results Overview 


The results of the 4 conducted tasks lead us to indicate 4 main aspects of interactions: 
(i) location and position of user relative to the projected object, (ii) projected object 
displacements, (iii) object rotation and (iv) slicing & extracting sub-elements to get 
deeper insight into the projected visuals (as illustrated on Fig. 2). 

Each of the identified gesture and interaction groups was analysed further to look 
for patterns throughout the study session and form conclusions and recommendations: 


(i) localisation and positioning: obtained results have shown that participants were less 
eager to move around the cylinder rather than just looking at the cylinder. Occa- 
sionally, they tried to move closer to the visualization, yet no significant differences 
between lab and in situ industrial conditions were observed. 

(ii) displacement: Most of visual objects displacement moves were one-hand driven, 
except when in real in situ conditions where users tend to use both hands for 
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object grabbing and manipulation. Notably, we identified individual cases where 
participants utilized index finger pointing gestures for moving the object to the 
desired destination. 

(iii) rotation: Most of the participants performed this task with the implemented gesture 
of rotation by grabbing the object with two hands, yet a considerable portion of 
users tried to rotate the object with one hand as well. 

(iv) slicing & extracting: extracting single images from the stack of images (3D data) 
was observed in more complex tasks (T2 and T3). Again, for in situ conditions 
users tend to use both hands while generally one-hand interaction was preferred by 
the participants. Notably, slicing was the most diverse action in regard to preferred 
gestures proposed by the users. While single finger gestures were noticed for empty 
lab condition, no single finger gestures were observed in situ. 


w 
ag U 


Fig. 2. Visual representation of the actions of the user - moving and rotating the part of the 3D 
object during gestural interaction with 3D AR data. (Source: Personal collection) 


Figure 3 shows the mean rating of each category evaluated by users and the mean 
weight each category has, with the width of the bar indicating the weight [23]. Based 
on the graph we note that on average the physical and temporal demand were not the 
highest rated, we also can note that performance is the most highly rated and weighted 
category, which indicates that most users were preoccupied by their performance during 
the tasks. The frustration is also very highly weighted, which indicates that for a lot of 
users the frustration was important. Overall, mental demand is not rated highly, which 
is promising for implementation of the technology, as it shows that the complexity of 
the tasks did not increase because of HoloLens, although effort is one of the two highest 
rated categories. 


5 Discussion and Conclusions 


This exploratory study demonstrated patterns of possible use of HMD AR technology for 
the specific, industrial flow inspection application. Initial results revealed that behavior 
of a group of users while interacting gesturally with a virtual 3D data visual might be 
different for safe, lab conditions than for real industrial settings. In open space situations, 
where there are little to no potential risks while using the application, users tend to focus 
more on optimal solutions like operating with one hand. Analysis of data connected with 
object movement gestures shows that when creating an AR application for an industrial 
environment, it is important to implement grabbing functionality for both 1 and 2 hands. 
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Fig. 3. Bar chart showing the mean rating of each category measured in the NASA TLX and their 
mean weight. (Source: Personal collection) 


Some minor problems revealed in the study might have been associated with the 
following factors such as: big-scale silo and its projections size coupled with limited 
HoloLens display area; habits of using everyday touchscreen gestures designed for flat, 
tangible surfaces that are not fully transformable to in-the-air space; etc. In the context 
of designing for HoloLens 2, the focus should be on improving the comfort of work. 
Some users have reported that wearing goggles is uncomfortable. They had them on 
their heads for about 10—15 min. Long work in the goggles may be problematic for users 
due to the uncomfortable mounting on the head. 

The most frustrating problem for the user was the absence of recognition of the 
gestures they wanted to use. It made subsequent attempts more nervous. Despite a few 
shortcomings, users were enthusiastic about AR and how to use it. This points to the 
fact that along with the popularization of Augmented Reality in everyday life, they will 
show a desire to immerse themselves in it. Main conclusions derived from this study are 
as follows: 


e Significant difference between the gestural interactions performed by the participants 
within a neutral and an industrial environment was spotted in. 

e There are physical and technical limitations of HoloLens HMDs. Hence the context of 
working within the industrial settings must be considered when designing a particular 
solution. 

e There might be some benefits of simultaneous implementation of several different 
types of gestures for the same operation/command/task to accommodate the needs of 
different users. 


Results suggest further exploratory research on this topic is recommended. Revealed 
patterns show we can highlight a need to mix single- and two-hand- gestures while 
building applications for industrial use. Furthermore, it was noticed that some users treat 
working with a 3D dataset as working with a physical object, while others treat it as a flat 
touchscreen-alike visual. Some differences were observed when comparing the users” 
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behavior in a semi-industrial settings and in an empty room to be further investigated in 
the future. All the examined cases suggest the need to consider both the surroundings 
and the context, when designing augmented reality applications for industrial settings. 
Furthermore, it would be interesting to explore possible combinations of the gestural 
interaction with some other sensing technologies, such as EMG [24] or ultrasound-based 
[25], to involve some machine learning algorithms [26, 27] for optimising the mixture of 
gestural, voice and traditional input [28] as well as further explore eye-tracking modality 
to track attention and performance of the users [29-31]. 
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Abstract. We present the Polish adaptation of the Cybersickness Sus- 
ceptibility Questionnaire (CSSQ) by Freiwald et al. [4] and its short 
version. To validate the scale, we analyzed the results of 85 participants 
of the VR artistic experience “Nightsss” in versions with 3 and 6 degrees 
of freedom. Both long and short versions are characterized by high inter- 
nal consistency. The validity was analyzed by evaluating the difference 
between the scores obtained in the CSSQ-PL and the level of experienced 
cybersickness measured with the Simulator Sickness Questionnaire. The 
scale is suitable for screening participants in research conducted with the 
use of immersive virtual reality (VR). Further studies are necessary to 
confirm the predictive power of the scale. 


Keywords: virtual reality - cybersickness - immersion - questionnaire 


1 Introduction 


Cybersickness may be described generally as a bodily discomfort induced by 
exposure to VR content. It may be one of the enduring hindrances in order 
to use the full potential of VR technology. The symptoms of cybersickness are 
similar to the ones which people suffer from when they get motion sick. There is a 
number of different definitions of cybersickness. In general, cybersickness is a set 
of symptoms produced by VR exposure that temporarily affects the well-being 
of the VR user and impedes the use of VR [4]. The most common symptoms 
are nausea, vertigo, disorientation and pallor [10]. It is most often categorized 
as a form of simulator sickness [16], although there is a distinction between the 
experience of cybersickness and simulator sickness. The dominant symptoms are 
different, i.e., in cybersickness, the disorientation symptoms tend to occur more 
often. 


1.1 Measuring Cybersickness 


There are two types of cybersickness measures, objective and subjective. Objec- 
tive measures include physiological markers, such as bradygastric activity, res- 
piration rate [3,8], heart rate [15], galvanic skin response [5] and behavioral 
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measures, such as early termination of VR experience [9], delivery of incomplete 
task [15], impeding the further VR use [2]. Subjective measurements, on the 
other hand, are different multi-item questionnaires, such as Simulator Sickness 
Questionnaire (SSQ; [6]). The subscale measuring the experienced symptoms 
level consists of 16 items. It gives a possibility to generate a total sickness score 
and the levels of oculomotor discomfort, disorientation, and nausea [18]. It is 
commonly appreciated as a measurement of cybersickness in VR. 

Other measurements that are offered for evaluating cybersickness are Cyber- 
sickness Questionnaire (CSQ; [17]) and Virtual Reality Sickness Questionnaire 
(VRSQ; [7]). CSQ was developed to capture the occurrence of cybersickness in 
VR context, and because it is based on SSQ, uses the SSQ answers to esti- 
mate dizziness and troubles with focusing. The VRSQ is a questionnaire which 
is aimed to measure motion sickness in a virtual reality environment [7]. It is a 
modified version of SSQ, and consists of 9 items grouped into two components, 
namely oculomotor and disorientation. 


1.2 Predicting Cybersickness 


Cybersickness Susceptibility Questionnaire predicts the tolerance to VR expo- 
sure. The questionnaire consists of three parts and it is aimed to be used before 
the VR environment exposure. It was created to demonstrate the effect of biolog- 
ical, chemical and psychological occurrence of cybersickness [4]. The first part 
of CSSQ collects demographic questions, the second part evaluates the phys- 
ical health and fitness levels. There are questions aimed at investigating the 
headaches, stomach complaints, alcohol and substance consumption within 24h 
prior to the VR exposure as well as acute complaints, as all of these are the indi- 
cators of a higher probability of cybersickness occurrence [4]. The last part of 
the questionnaire evaluates the sensibility to motion sickness caused by various 
types of motion, such as driving as a passenger, traveling by train, or sitting on 
a roller coaster ride. 

Most of the questionnaires which measure cybersickness focus mainly on 
the outcomes, not on what triggers the problem to occur. It is important to 
combine both the physiological and psychological factors of cybersickness to 
better understand this phenomenon, as the occurrence of cybersickness is one of 
the reasons impeding people from the use of VR [14]. Moreover, there is some 
evidence that cybersickness may be negatively related to presence in VR. The 
sense of presence presumably suppresses cybersickness because the attention is 
directed to the unpleasant motions from the body [18]. Thus, in some studies, 
screening potential participants for their susceptibility to cybersickness may help 
reduce the risk of unpleasant or poor VR experiences for participants and the 
dropout rate for researchers simultaneously. 

The main objective of the presented study is to prepare the Polish version 
of CSSQ and test its psychometric characteristics to evaluate its usefulness in 
research and other VR applications, e.g., physiotherapy or education. 
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2 Method 


The adaptation of the scale was a part of a larger study on emotional reactions 
to immersive art. The study design was similar to that used by the Authors of 
the original scale [4]. The Nightsss experience has been chosen as a material to 
be shown to participants, as it contains two parallel versions, i.e., an interactive 
and a cinematic one [11,12]. The content of both was identical, and the difference 
was in the interactivity level. In the interactive version, the participants were 
allowed to walk freely in the 3D environment and have some interactions with 
objects, while in the Cinematic VR (CVR) version, they were only able to watch 
the experience in the form of a 360 3D movie. The procedure was conducted on 
the HTC Vive Pro Eye HMD with the wireless addon and the Intel I9 2.8 GHz 
PC with Gigabyte Geforce RTX 3070. The study obtained ethics approval from 
the Ethics Committee of the Institute of Psychology Polish Academy of Sciences. 


2.1 Participants and Apparatus 


Eighty five Caucasian (Polish) participants (26 male, 58 female, 1 nonbinary 
or other) took part in the experiment. The mean age was respectively 31, 31 
(SD =9 in both groups), and 22. Levene’s test for homoscedasticity of variance 
detected no differences (p = .296). 


2.2 Measures 


Polish adaptation of SSQ [1] was used. The original scale is divided into two 
parts: one with the questions about the current health state, alcohol, and 
medicine consumption and the other measuring the symptoms of simulator sick- 
ness, on 4 points scale, from none to severe. The results range was 0—179.5. We 
omitted non-diagnostic questions as the whole procedure was aimed to be not 
very bothersome for participants. 

CSSQ [4] consists of three parts. The first collects demographic data, second 
consists of questions considering the health and fitness of participants, includ- 
ing two questions on a 5-point Likert scale and 11 binary ones. The last part 
measures vulnerability to motion sickness with thirteen questions on a 5-point 
Likert scale from 0 - very rarely to 4 - very often (Table 1). 


2.3 Stimuli and Procedure 


After entering the vrLab, participants were asked to fill the first part of the Polish 
adaptation of the SSQ (demographic data, questions about health condition, past 
illness, alcohol, and drug intake, and sleep quality) [6] together with CSSQ-PL. 
Next, they were informed about the equipment, procedure, the possible effects of 
using the VR headset, and the collected data. Then the GSR and HRV electrodes 
were placed. In the first part of the VR experiment, the participants stood in 
an area limited to 1,5sq m and viewed short video clips of neutral, positive, and 
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negative valence from the Public Database Of 360 Videos With Corresponding 
Ratings Of Arousal And Valence [13]. Next, the participants saw either CVR 
standing in one place or interactive “Nightsss” where they were able to move 
around an area limited to about 7sq m. The experience lasted around 8 min. 
After removing the VR goggles, the participants were asked to fill out the SSQ 
and evaluate their feelings and opinion with open-ended questions. 


2.4 Validation 


The language adaptation to Polish was performed as follows: (1) the scale was 
translated by three independent researchers from English to Polish, (2) all three 
versions were compared to select the final version, (3) the subscale’s reliabil- 
ity and validity were tested. The questions with a binary answer scale in the 
“Health and fitness” subscale were coded as yes=1 and no=0. Reliability of 
the “Health & Fitness” subscale was tested by point biserial correlation between 
the item score and the total subscale score. The “Motion Sickness” subscale’s 
reliability was evaluated by analyzing the alpha Cronbach’s scores if an item 
would be removed (Table 1). Next, we calculated the sum of the “Health and 
fitness” and “Motion sickness” subscales as a sum score of CSSQ items excluding 
the demographic variables. We also calculated the total score of the SSQ-post 
and its subscales (Nausea, Oculomotor, Disorientation), following Biernacki et 
al. [1] to test the validity of the adapted scale. Next, we visually analyzed the 
distributions of the variables and normalized the variables to a 0-1 scale, and 
calculated the correlation matrix. 


3 Results 


3.1 Language Adaptation 


The final version of the Polish scale and the original scale published by Freiwald 
et al. [4] can be found in Table 1. Individual translations used to build the final 
scale can be obtained on request from the Authors. 


3.2 Reliability 


Due to the content of the “Health and Fitness” subscale, which refers mainly 
to the medical records and is constructed mainly on binary variables, the reli- 
ability analysis was carried out only for the “Motion sickness” subscale. The 
overall Cronbach’s a=.87, which indicated very high reliability. Cronbach’s a 
and items correlations per item can be found in Table 1. Based on the analy- 
sis, we also prepared a short version of the “Motion sickness” subscale, which 
included six items (selected on the basis of itemwise analysis to maintain high 
internal consistency - Cronbach’s a above .80). The subscale is characterized by 
high reliability (a =0.82) and can be used for a quick screening. Items selected 
for the short scale are marked with an asterisk in Table 1. 
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Table 1. Cybersickness Susceptibility Questionnaire - PL. 


English [4] Polish 
Demographic data Dane demograficzne Scale 
Age, Height, Gender, Ethnicity Wiek, wzrost, płeć, pochodzenie Freetext 


How often do you use VR? 


Jak często korzystasz z rzeczywistości 
wirtualnej? 


Likert scale 


Health and Fitness Zdrowie i stan fizyczny biserial | p-value 
corr 
How often do you suffer from a Jak często cierpisz na migrenowe bóle | Likert scale „162 < .001 
migraine headache? głowy? 
How often do you suffer from stomach | Jak często cierpisz na problemy Likert scale .188 < .001 
discomfort? żołądkowe? 
Did you consume alcohol in the last Czy spożywał _ś alkohol w ciągu tak/nie yes/no | .309 .004 
24h? ostatniej doby? 
Did you consume drugs in the last Czy zażywał _Ś narkotyki w ciągu tak/nie yes/no | n/a n/a 
24h? ostatnich 24 godzin? 
Did you consume medication in the Czy brał _Ś jakieś leki w ciągu tak/nie yes/no | .325 -002 
last 24h? ostatnich 24 godzin? 
Do you suffer from a cold or flu at the | Czy chorujesz obecnie na przeziębienie | tak/nie yes/no | n/a n/a 
moment? lub grype? 
Do you suffer from an ear infection at | Czy masz obecnie infekcję ucha? tak/nie yes/no | n/a n/a 
the moment? 
Do you suffer from a respiratory Czy chorujesz obecnie na jakąś chorobę | tak/nie yes/no | .084 AAT 
disease at the moment? układu oddechowego? 
Are you suffering from a lack of sleep | Czy cierpisz obecnie na niedobór snu? | tak/nie yes/no | .215 048 
at the moment? 
Are you suffering from an eye disease? | Czy cierpisz obecnie na jakąś chorobę | tak/nie yes/no | .257 017 
oczu? 
Have you been prescribed new glasses | Czy przepisano Ci ostatnio nowe tak/nie yes/no | .257 021 
recently? okulary? 
Do you suffer from a limitation of your | Czy masz zaburzenia układu tak/nie yes/no | n/a n/a 
vestibular system? przedsionkowego? 
Do you suffer from a limitation of your | Czy masz zaburzenia układu tak/nie yes/no | n/a n/a 
oculomotor system? okoruchowego? 
Motion Sickness Choroba lokomocyjna a if Item- 
Item Total 
Deleted | Corr. 
When I drive a car, I feel sick as a Jest mi niedobrze gdy jade Likert scale .850 712 
passenger samochodem jako pasażer /pasazerka.* 
Reading as a passenger makes me sick. | Czytanie w czasie podróży przyprawia | Likert scale 873 „515 
mnie o mdłości 
I feel sick when I am sailing. Likert Jest mi niedobrze podczas żeglowania. | Likert scale .858 588 
scale 
I feel sick on small boats. Jest mi niedobrze kiedy pływam Likert scale .856 .637 
łódką.* 
I feel sick when I go by train. Jest mi niedobrze kiedy jadę Likert scale .869 406 
pociagiem.* 
I feel sick when I go backwards by Jest mi niedobrze kiedy jadę tyłem do | Likert scale .859 605 
train. kierunku jazdy w pociagu. 
I feel sick when I ride the bus or am a | Jest mi niedobrze kiedy jade Likert scale .855 .673 
co-driver in a car. autobusem lub siedzę obok kierowcy w 
samochodzie.* 
I feel sick when I ride the bus sitting Jest mi niedobrze kiedy jadę Likert scale 851 -709 
backwards. autobusem tylem do kierunku jazdy.* 
I feel sick when I am in an airplane. Jest mi niedobrze podczas lotu Likert scale 868 408 
samolotem. 
I feel sick on rotating swivel chairs. Robi mi się niedobrze na krześle Likert scale 863 523 
obrotowym.* 
I feel sick when I ride a roller coaster. | Jest mi niedobrze podczas jazdy Likert scale .866 475 
kolejką górską. 
I feel sick when I ride a carousel. Jest mi niedobrze na karuzeli. Likert scale „855 .655 
I feel sick when swinging on a swing. Jest mi niedobrze gdy huśtam się na Likert scale .866 450 


huśtawce. 
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3.8 Distributions 


Overall, most participants reported very low to low susceptibility to cybersick- 
ness (M = 7.36, SD = 6.86) and very low to low cybersickness after the VR expo- 
sure (M = 16.37, SD = 15.93). The distributions were strongly left-skewed for all 
subscales (see Fig. 1). 
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Fig. 1. Distribution of subscales 


3.4 Validity 


Due to the skewness, we decided to test the predictive power of the CSSQ scale 
by applying the authors’ original approach. First, we normalized both CSSQ and 
SSQ total scores to 0-1 scale, and extracted the difference between the estimated 
susceptibility (CSSQ) and the obtained cybersickness levels (SSQ) to evaluate 
whether these scores matched. The variable had close-to-normal distribution 
with a mean of —.011 and standard deviation of .212. The mean was very close 
to zero and 80% of the participants had scores within one standard deviation 
(see Fig. 2). 


1.00 


LPL a a R ne ii E EAS EEE ENEE AE E PE AE EE gl 
l ell 


SSQ-CSSQ difference 
I 
© o 
N o 
u1 © 
= =—= 
Sar a 
=== s 

— 
= 
=>] 
1 
== 

| el 
=a 
= 
|. 

= 

m 

= 

= 

= 

= 

= 

= 

= 

= 

m 

m 

= 

= 

= 

= 

m 

LJ 

a 

LI 

LJ 

LJ 

a 

a 

a 

1 

1 

1 

1 

1 

1 


—1.00 
ID 


Fig. 2. Difference between the prediction (CSSQ) and the obtained cybersickness levels 
(SSQ) 
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Moreover, we tested the strength of the linear association between the sub- 
scales and demographic variables (Fig. 3). We found a strong positive correlation 
of SSQ with CSSQ (r=.58), a moderate correlation of gender with both CSSQ 
and SSQ. Analyzing the correlations between the CSSQ and SSQ subscales, 
we can observe that “Health & Fitness” subscale was weakly correlated only 
with SSQ total score (r= .32), while the “Motion Sickness” subscale correlated 
weakly to moderately with all SSQ measures. All subscales within the same con- 
struct correlated in the expected direction. Age did not correlate with any of 
the variables. Moreover, female gender and lower height were related to higher 
levels of motion sickness sensitivity and overall cybersickness, although the asso- 
ciation was relatively weak (r= .35x.03). Since the sample consisted of more 
women than men, and there was a strong negative correlation between gender 
and height, it is possible that the correlation with height was an artifact. 


4 Discussion and Future Directions 


We have demonstrated the reliability and validity of the Polish version of the 
CSSQ. Both long and short versions are characterized by high internal con- 
sistency. At the same time, there are no items to be removed to significantly 
improve consistency. Unfortunately, we also observed a strong skewness of the 
results. This may be due to the fact that potential participants were previously 
informed about possible contraindications for VR studies and resigned from par- 
ticipation at the recruitment stage. Unfortunately, no satisfactory solution to 
this limitation is available since a different approach would be against the rules 
of ethics. CSSQ-PL scores match the levels of actual symptoms of cybersick- 
ness. This makes the questionnaire a good tool for screening participants in VR 
research and applications, including gaming, physiotherapy, or education, and 
protects potential participants from unpleasant experiences. In future research, 
it is necessary to 1) recruit a much larger sample in order to at least partially 
reduce the impact of skewness of the distribution on the results and 2) select a 
longer and more diverse VR material in order to better verify the accuracy of 
the predictions. 
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Fig. 3. Pearson’s correlation matrix. Scales were normalized to 0-1. Visible correlations 
are significant at p < .005 
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Abstract. The electronic skin described in the article comprises screen- 
printed graphene-based sensors, intended to be used for robotic appli- 
cations. The precise mathematical model allowing the touch pressure 
estimation is required during its calibration. The article describes the 
recurrent neural network model for graphene-based electronic skin cali- 
bration, in which parameters are not homogeneous, and the touch force 
characteristics have visible hysteretic behaviour. The presented method 
provides a simple alternative to the models known in the literature. 


Keywords: Electronic skin - Graphene nanoplatelets - Force-sensitive 
resistors - Recurrent Neural Networks - Hysteresis 


1 Introduction 


Skin is the thin layer of tissue forming the natural outer covering of the body. 
It is a protective barrier against mechanical, thermal and physical injury and 
hazardous substances. It also acts as a sensory organ (touch, temperature, strain, 
moisture). The electronic skin (e-skin) and its applications in many domains have 
attracted researchers worldwide for over three decades [8]. E-skin as an interface 
that mimics a biological tissue is being developed in medicine (smart health care 
[18], prosthesis [20]), and robotics [3]). In robotics, it allows for increasing safety 
in human-machine cooperation, the agility of robotic manipulation [19], and soft 
robotics [2, 4]. 

To calibrate and establish the measurement characteristics of e-skin sensors, 
a standard approach base on the measurement of the force exerted by a ref- 
erence device. In the known research, linear, quadratic, and cubic polynomial 
models were fitted to data to calibrate the example sensor [5,9,21]. The best 
results were obtained using a Huber regression with a quadratic polynomial 
model. In [15] e-skin semi-automatised calibration procedure using industrial 
robot FANUC LRMate 200iC equipped with a reference force sensor was pre- 
sented where second-order exponential function and logistic function were used. 
To assess fitting results, two parameters were analysed: adjusted coefficient of 
determination (ARS) and root-mean-square error (RMSE). However, in all the 
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abovementioned research [15] the e-skin graphene sensor’s characteristics man- 
ifest hysteretic behaviour, and the visible effect of hysteresis on the modelling 
accuracy has not yet been discussed. 

Hysteresis is the dependence of the state of a system on the state’s history, 
and we can find this phenomenon in many science fields as physics, chemistry, 
and engineering. Besides the subject-specific models, some hysteretic models 
capture general features of many systems with hysteresis that can be classified as 
algebraic models, transcendental models, differential models and integral models 
[16]. One of the promising solutions is to use artificial neural networks to model 
the system’s data with hysteretic behaviour. They have distinct advantages over 
linear identification methods, i.e., the approximation of multivariable nonlinear 
functions, the simple gradient-based adaptation of model parameters and a rapid 
calculation of neural network equations. In contrast to analytical models, the 
design procedure of the ANN does not demand an exact knowledge of model 
physical equations and physical parameters that describe the model, but only 
values of model variables in the causal form. Several neural network architectures 
with recurrent layers and memory capacity were proposed recently, e.g. recurrent 
neural networks for ultra-capacitors [1], physics-informed deep neural networks 
for mechanical dampers [10] or extended Preisach neural network [6] among 
others. 

The presented research aims to propose a novel method of e-skin graphene 
sensor modelling using the NARX recurrent neural network that can describe 
the hysteretic behaviour of the sensor. 


2 Graphene-Based Electronic Skin 


The e-skin used for the research consisted of two layers [13,14]. The first is a 
conductive layer of comb electrodes printed on plastic foil and connected along 
columns and rows. In contrast, the second one comprises FSR graphene sensors 
arranged in a rectangular pattern placed on a plastic foil. In the research, a 
matrix with the size of a single sensor (approx. 5 x 5mm) was used. The e-skin 
controller measures the pressure for each sensor and transmits the position and 
touch force exerted on the active surface. The data is sent to a computer, pro- 
cessed and saved. The computer software enables the visualization of the touch 
results as a colour-coded image. Figure 1 presents the hand touch measurement 
for the 16 x 32 FSR matrix. 

The measurement acquisition setup comprised the FANUC LR Mate 200iC 
manipulator, the R-30iA Mate manipulator controller, the e-skin with a driver, 
the OnRobot Hex-e 6-axis force and torque measuring device with a controller, 
and a general-purpose PC. Data from e-skin sensors and the reference Hex- 
e sensor were acquired during the calibration procedure. Data acquisition was 
subdivided into the ‘loading phase’, when the force exerted on the particular 
sensor of the e-skin by the robotic arm increased, and the ‘unloading phase’, 
when the force decreased (Fig. 2). 
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Fig. 1. Example hand touch visualization for the size of 16 x 32 cells; Gaussian blur 
used for post-processing. 


Fig. 2. Illustrative representation of the calibration station: 1. LRMate 200iC manip- 
ulator, 2. reference e-Hex sensor, 3. robot tool, 4. e-skin electronic system, 5. e-skin. 


3 Neural-Network Modelling 


Nonlinear AutoRegressive eXogenous model (NARX) in the form of the recurrent 
artificial neural network (R-ANN) was used to model the e-skin graphene sensor 
to estimate the pressure (force exerted on sensor) based on the sensor readouts. 
NARX-RNN nonlinear model extends the autoregressive linear model with the 
exogenous input (ARX), popular in time-series nonlinear modelling. NARX- 
RNN model is the input-output model, where the output in each step is described 
by input signal with noise. NARX-RNN has memory capabilities. It memorizes 
previous data and can be used to model hysteretic behaviour. While making a 
decision, it considers the current input and what it has learned from the inputs it 
received previously. Output from the previous step is fed as input to the current 
step creating a feedback loop. 
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In such a case, the nonlinear part of the NARX-RNN model was described 
as 
ynn(k) = Fyn (hk) = f (V (k), V (k — 1), Fyn(k — 1)) (1) 


where ynn(k) - output of the NARX-RNN, Fyn(k) - estimated output (touch 
pressure) at step k, V(k) - sensor readouts at the step k, k - sample number, 
t = kT, - time, 1, - sampling time. 

Exemplary scheme of used in the research NARX-RNN model is presented 
in Fig. 3. 


input layer | nonlinear hidden layer linear output layer 


Fig. 3. Used in the research NARX-RNN architecture (2 neurons in the nonlinear 
hidden layer, D - delay (k — 1)). 


The NARX-RNN comprises the nonlinear hidden layer that processes the 
input data as 


yO ni(k) = FONU (2 Gov (®)) sf Gym € [-1, 1] (2) 
z (nyyL(k) = Bb) + Wp ny (E 1) + Di WM nai (A) 


where y()yyn(k) - output signal of the j-th neuron in the nonlinear layer, 
z0 (pNn(k) - input signal to the j-th neuron in the nonlinear layer, n - the 
number of neurons in the nonlinear layer, p - the number of neuron inputs in 
the nonlinear layer, MI - the weight of the i-th input to j th neuron in the 
nonlinear layer, «;(k) - i-th input to the network (with tapped delay line, D 
inputs in Fig.3), b - the threshold offset of the n-th neuron in the nonlinear 
layer. The second term of (2) describes the recurrent feedback loop with delay 
(D in Fig.3). In the considered case, the output of the RNN network y“,,(k) is 
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the output signal from a linear output layer, with 1 linear neuron described as 
follows 
2 2 2 2 
(ek = Wak Ge) Hee ER a 
zy (k) = by?) + >4 waji P hi (k) 

where ynn(k) - the network output, equal to the linear layer output, z) (k) - 
the input signal to the neuron in the linear output layer, M 4 - the weight of 
the ith input to neuron in the linear output layer, and by ©) - the neuron’s bias 
in the linear output layer. 

For the training and evaluation of the NARX-RNN model, the estimation 
error was calculated as follows 


ENARX-RNN (k) = ynn(k) — y(k) (4) 


4 Results 


4.1 Research Method 


The Mean Squared Error (MSE) has been used to describe the performance 
function ENARx-RNN of the NARX-RNN model during training and testing. 
The changes in the weights in the i-th iteration were used in the NARX-RNN 
models according to the Levenberg-Marquardt algorithm, and variable metrics 
method [12]. The training algorithm stop conditions were defined because of 
the possible large step of each iteration. The mentioned conditions are usually 
estimated as the assumed minimum value of the performance function and the 
maximum number of training iterations. In the case described in this article, the 
stopping conditions were: epochs = 1000, ENARx-RNN < 107%. The early stop 
method was used during training of the NARX-RNN model in a controlled way 
by segmentation of the dataset into three subsets namely: Training subset used 
during NARX-RNN training (70% of data), Validation subset used for NARX- 
RNN validation during training and to prevent the data overfitting (15% of data), 
Testing subset not used in the training phase, only used for comparison of the 
models during final evaluation (15% of data). The NARX-RNN models were 
trained per iteration in batch mode [7], while weights and biases were initialized 
using the Nguyen- Widrow initialization procedure [17]. The values of the sensor 
readouts and the touch pressure measured by the HEX device were normalized 
to the range [0, 1] to avoid the early stop due to neuron saturation. The quality of 
the NARX-RNN model of the e-skin graphene sensor was evaluated based on the 
MSE performance function, and the goodness of fit between the estimated data 
and the reference data was calculated as Root Mean Square Error (RMSE) [11]. 


4.2 Modelling 


The proposed NARX-RNN model was evaluated on the exemplary sensor (row 
6 and column 16 of the sensor matrix). The sensor-measured characteristic is 
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presented in Fig.4. The two types of nonlinear hidden layer neuron transfer 
functions were used: hyperbolic tangent sigmoid transfer function (tansig) and 
symmetric saturating linear transfer function (satlins). The RMSE values for 
the NARX-RNN models with the different numbers of neurons in the hidden 
layer before and after training are presented in Table 1. 


Sensor characteristics (row:6 column:16) © RNN model residuals (row:6 column:16) 


M 


o 
io 
1 
2 
o 
© 
© 


0.8 & 0.008 


Load 
Unload 


IJ 


z 


E-skin sensor output (normalized to range 0:1) ©) 


ALU 


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 l: 200 LL m 500 600 800 
Force (Hex) (normalized to range 0:1) Sample number 


Fig. 4. (a) E-skin sensor measurement characteristic. (b) Residuals for e-skin sensor 
RNN model (1 neuron described with satlins function). 


Table 1. Quality indices of the RNN models for ‘loading’ and ‘unloading’ phases. 


NL function | Nyx | epochs | RMS Ficaa | RM S Eunloaa 

tansig 1 0 0.4322 0.3981 
1 187 0.0028 0.0030 
2 0 0.9629 0.9559 
2 68 0.0028 0.0027 
5 0 0.5648 0.5697 
5 76 0.0028 0.0026 

satlins 1 0 0.3582 0.3554 
1 14 0.0028 0.0030 
2 0 0.6814 0.6734 
2 17 0.0028 0.0030 
5 0 0.6354 0.6282 
5 21 0.0028 0.0026 
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4.3 Discussion 


Obtained results, presented in Table 1, indicate that touch pressure estimation 
improves after training around 200-300 times. The estimation errors do not 
depend significantly on the number of neurons in the hidden layer, even if only 
one neuron is used. Moreover, the number of iteration epochs is a few times 
lower in the case of satlins function describing neurons in the hidden layer 
(14-21 epochs), compared with tansig function (68-187 epochs). The obtained 
quality indices are similar for the ‘loading’ and ‘unloading’ phases, thus properly 
modelling the hysteretic behaviour of the sensor. 


5 Summary 


The article presents the possibility of touch pressure estimation of the e-skin 
graphene pressure sensors with hysteretic behaviour using NARX recurrent neu- 
ral networks. Increasing the number of neurons in the nonlinear hidden layer did 
not improve the generalization properties of the model. Moreover, the neurons 
described by the simple satlins function give similar results with fewer epochs 
than those described with tansig. 

The presented method provides a simple alternative to the e-skin graphene 
pressure sensors models known in the literature. Further research should be done 
to extend the developed model to the sensor matrix and the layered deep neural 
networks for e-skin calibration and data compression. 
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Abstract. One of the key issues in human-machine collaboration is 
human safety. Safe human-robot interaction can be implemented using an 
electronic skin (e-skin) that detects the human’s proximity to the collab- 
orative robot (cobot) casing even before the collision with the machine. 
The detection delay of such a situation should be as small as possible to 
be able to stop the machine safely. This paper presents an analysis of the 
results of estimating the proximity of a human to a robot with an elec- 
tronic skin placed on its surface. The proximity estimation system works 
by measuring the capacitance of an open capacitor, and the placement 
of the capacitor on the conductive robot case significantly affects sys- 
tem performance. This paper outlines which parameters have the most 
important influence on this performance. 


Keywords: Collaborative robotics - Electronic skin - Human 
proximity estimation - Open capacitor 


1 Introduction 


In recent years, an increasing number of areas can be observed that enable 
human-robot collaboration in the same workspace. The demand for such robotic 
applications is emerging in industry, education, agriculture, medical services, 
security and space exploration [1]. Cobots (collaborative robots) have become 
an essential component of Industry 4.0 [2]. Equipping cobots with additional 
sensors allows them to comply with special safety measures, avoid collisions 
with humans and manipulate objects more agilely. 

In order to ensure that the effects of collisions with humans are minimised, 
the construction used in cobots can be lightweight and compliant, and there is a 
reduction in the power and strength of such machines [3]. Another approach is 
to equip cobots with an electronic skin (e-skin), which will allow the detection 
of a human approaching the robot’s casing [4]. 
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Detection of human proximity can be implemented in a number of ways. 
For this purpose, the standard approach is to measure physical properties such 
as the reflection of light [5], the reflection of a sound wave [6], the reflection 
of an electromagnetic wave in radar [7,8], as well as the detection of changes 
in a selected parameter in the area around the sensor, such as the electrical 
permeability of the sensor [9,10], the magnetic permeability of the sensor [9], or 
the temperature of the sensor [11]. 

The aim of the manuscript is to present selected aspects of the operation 
of a capacitive proximity estimation mechanism for cobot enclosure. Capacitive 
proximity estimation is implemented by measuring the operating parameters 
of an open capacitor, one plate of which is an e-skin [4]. An important aspect 
addressed in the paper is the placement of the e-skin on a conductive machine 
case, which significantly affects the proximity estimation. 

The paper is organised as follows. In Sect. 2, the developed test stand consist- 
ing of e-skin placed on industrial robots metal case is presented. In Sect. 3, the 
tests results are described and analysed. Finally, Sect.4 provides the summary 
and further investigation proposal. 


2 Developed Test Stand 


The main components of the hardware setup are the e-skin together with the 
measuring controller, a Fanuc M10iA industrial robot and a PC. The e-skin 
together with its electronic measurement system is described in detail in [4]. In 
brief, the e-skin is implemented as a rectangular array of graphene force sensi- 
tive resistors (FSR). The touch pressure value is measured via a conductive comb 
electrode layer. The elecronical circuit for measuring the proximity of objects to 
the e-skin consists of a rectangular signal generator with a frequency dependent 
on the capacitance of an open capacitor formed by the conductive e-skin layer 
and a reference plate. The NUCLEO-F44RE board with an STM32 microcon- 
troller clocked at 90 MHz performs the counting of the number of periods of 
the rectangular signal in fixed time interval 7. The implementation consists of 
counting the occurrences of a rising edge via an external interrupt at a specified 
fixed time interval T. The length of this interval depends on the relevant settings 
of the microcontroller’s internal timer such as PSC (Prescaler), CKD (Internal 
Clock Division) and ARR (AutoReload Register). The default length of the time 
interval is set to 728,178 us. The PC performs the measurement data logging 
transmitted serially via the USB port. The principle of measuring the proxim- 
ity of an object to the e-skin is based on the properties of an open capacitor. 
The capacitance of an open capacitor changes when an object, with an electrical 
permeability different from that of a vacuum, comes into proximity. Changes in 
this capacitance cause a change in the frequency of the signal generated by the 
electronic circuit, which is measured indirectly by the microcontroller. Placing a 
conductive robot enclosure in the vicinity of a system operating in this way causes 
significant problems with proximity estimation. A previous study [4] confirmed 
the effectiveness of estimating the proximity distance of a human body part to 
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an e-skin placed on a dielectric material. As part of the research described in the 
current manuscript, the e-skin was placed on an industrial robot arm (Fig. 1). 
Bringing the hand close to the robotic arm mimics the working conditions of a 
cobot in which the robotic arm approaches a human. 


Fig. 1. Hardware setup consisting of: the Fanuc M10iA robot (1), the e-skin (2) and 
open capacitor reference plate (3). 


3 Research Results 


The tests conducted on the test stand were divided into two stages. In the first 
stage, the estimation of the proximity of the human hand to the e-skin at a dis- 
tance of approximately 20 mm and smaller was analysed without modifications 
relative to the tests conducted in [4]. In the next stage, modifications were made 
to the measurement system and the experiment was repeated. The modifications 
mainly concerned the operating parameters of the microprocessor chip used. 
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3.1 Initial Results 


The experiment consisted in approaching an open human hand oriented parallel 
to the surface of the e-skin, which was placed on the robot as described in 
Sect. 2. The size of the hand was about 10 cm x 20 cm. The approach velocity 
had a vector parallel to the normal surface of the e-skin, but its modulus was 
not recorded. The result of the measurement is the number of periods n of 
the signal generated by the electronic measurement system measured over a 
fixed time interval T. Changes in this number indicate a change in the relative 
electrical permeability in the area of the sensor which makes it possible to infer 
the detection of an object in its vicinity. During this measurement, the constant 
time interval during which the periods of the generated signal are counted was 
set to the default value and is 728.178 us. The results obtained are shown in 
the Fig. 2. The graph shows an experiment involving bringing a human hand 
close to the e-skin device. The e-skin proximity response with the measurement 
system configured by default is small and relatively difficult to distinguish from 
measurement noise. The determination of the signal-to-noise ratio (SNR) was 
calculated on the basis of the formula (1). 


SNR= (Aoignal ja (1) 


noise 


initial Hand Proximity Measurement 


Microcontroller Counter [ 16-byte reading] 


4000 6000 8000 10000 12000 14000 
Time [ms] 


Fig. 2. Initial proximity estimation results. 
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where: 
Asignal - root mean square aplitude of hand proximity signal, 
Anoise - root mean square aplitude of noise signal. 

In regards to initial results, the proximity of the hand to the e-skin Asignal 
was determined to be 10,282 while Ano;se was determined to be 1,090. As a result, 
SNR was calculated as 89,047. Such a result does not allow the estimation of 
the proximity distance, but is sufficient to dectect the presence of a hand in 
the vicinity of the e-skin. In order to improve the quality of the detection and 
allow the estimation of the proximity distance of the object, it was necessary to 
enhance the response of the system in relation to the noise present. 


3.2 Developed Improvement 


In order to improve the quality of the circuit’s response, the fixed time interval 
T in which the n periods of the signal generated by the electronic circuit are 
counted was increased. This was implemented by modifying the microcontroller’s 
internal timer settings. The value of the PSC prescaler was increased from 0 to 3 
which had the effect of increasing the value of T from 718 us to 2,872 ms. After 
the modifications were made, the experiment was repeated by bringing the hand 
close to the e-skin placed on the robot housing. The Fig. 3 shows an example 
of the resulting measurement system. As a result of the modifications, a more 
evident response of the e-skin placed on the robot housing to the proximity of a 
human hand was obtained. The determination of the SNR was calculated based 
on the formula (1). In terms of the example shown in Fig. 3 the hand proximity 
to eskin As;gnai was determined to be 66,338 while Anoise was determined to 
be 4,759. As a result, SNR was calculated as 194,335. This represents a much 
better result than the initial performance described in the Sect. 3.1. 


3.3 Discussion 


A summary of the measurement results obtained is shown in the Table 1. 


Table 1. Time reactions for e-skin pressure detection. 


Event Nnoise |NMmax | SNR T PSC 
Initial setup |6+1 |12+2) 89,047 | 718 us 0 
Improvement |6+1 |70+2 194,335 | 2,872 ms | 3 


The relatively small response of the system to an approaching hand is due 
to the increased capacitance of the open capacitor resulting from the presence 
of the conductor under the e-skin layer. This capacitance has been measured 
and is approximately 400 pF. In comparison, the change in capacitance due to 
the approach of an object with the properties of a human hand is of the order 
of a few pF. The change in capacitance caused by the object being targeted 
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Fig. 3. Improved proximity estimation results. 


for detection is two orders of magnitude smaller which results in little change 
in the frequency generated by the electronic circuit. In spite of these problems, 
an efficient detection of the proximity of the human hand to the robot was 
successfully achieved. 

The negative effect of improving the quality of the response of the e-skin 
placed on the metal housing of the robot to the approach of a human hand is 
an increase in the response time of the system to an approaching object. In the 
target application for safe human-robot collaboration, the measurement delay is 
a key determinant of whether the collaborative robot will have time to stop and 
avoid a dangerous collision. At the example TCP speed of an industrial robot 
of 4,000 mm/s, increasing the time T from 718 ws to 2,872 ms results, in the 
extreme case, in a robot response delayed by 2,154 ms which, at the above tool 
speed, translates into an increase in the potential braking distance of 8,616 mm. 
This extension of the braking distance is significant given the reaction distance of 
the e-skin to the human hand of approximately 20 mm. The increase in response 
time also results in a significant reduction in the potential for de-noising filtering 
of the measurement results. When applied to, for example, eliminate outliers, 
this will result in a further increase in braking distance. 


4 Summary 


Using e-skin for capacitance-based proximity estimation when placed near con- 
ductors poses problems. They are related to the occurrence of a much higher 
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capacitance of the open capacitor resulting from the presence of a conductor in 
the sensor's working area than from changes in capacitance resulting from the 
proximity of objects. This problem is very important, taking into account the 
target application of e-skin as a device supporting safe robot-human interaction 
- the housing of robots is often made of conductive materials. In the manuscript, 
in order to minimise the problem at hand, the time interval during which the sig- 
nal periods generated by the electronic system are counted was increased. This 
improved the response of the system and increased the SNR of the measurement 
system. The negative cost of obtaining such results is the increased response 
time of the robotic system. Further work on this topic may allow a more precise 
and robust estimation of the proximity distance of the human to the e-skin. 
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Abstract. The emergence of new areas of human-robot cooperation cre- 
ates the need to ensure human safety in this regard. Therefore, there is 
a need to develop new sensors to detect the presence of a human in 
the vicinity of a robot. One such sensor is an electronic skin (e-skin). 
Manufacturing and testing new e-skin prototypes is labor-intensive. This 
paper presents a software toolchain developed to simulate the operation 
of an e-skin used to detect human proximity. The toolchain is based 
on the finite element method and has been developed exclusively with 
free and open-source software. The presented toolchain makes it possi- 
ble to test e-skin modifications without the need for a physical prototype 
and significantly reduces implementation costs. The developed solution is 
multi-platform and allows parallel and multi-threaded calculations con- 
ducted on multiple machines simultaneously. This paper presents mod- 
eling results obtained for a simplified e-skin sensor, which are consistent 
with experimental results on the actual model. 


Keywords: Finite element modelling * Sensor prototyping * Sensor 
optimization * Electronic skin 


1 Introduction 


We are witnessing a revolution - robots, formerly reserved for industry and the 
professionals who work in it, are occupying more and more extensive areas of 
public space in, for example, medical services or education. The work of robots 
cooperating in spaces shared with humans should be supervised by various types 
of safety mechanisms and sensors. [1]. There is a number of concepts how to avoid 
collision. One of them is electronic skin (e-skin) [2], which was originally intended 
to mimic the sense of touch by a biological organ and was used in grippers and 
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end-effector mechanisms. Current research on e-skin can be divided into those 
aimed at providing reception of the same stimuli that are received by humans 
(e.g., temperature [3]), or extending the range of measurements to include those 
that are not received by human skin (e.g., magnetic fields [4]). At the Faculty of 
Mechatronics of the Warsaw University of Technology, research is being carried 
out on the production of e-skin, which, in addition to measuring the pressure 
force [5], also detects objects at a certain distance [6]. The time, cost and amount 
of work required to produce and test successive e-skin prototypes introducing 
improvements and new ideas is significant. 

The aim of this paper is to describe an open-source software toolchain devel- 
oped to simulate the operation of an e-skin electronic printed circuit using the 
finite element method (FEM). This will enable, among others, the testing of 
subsequent prototypes of the device without the need for a physical prototype. 
The main objective of this task is to improve the quality and development of 
estimating the proximity of human body parts based on e-skin. To this end, the 
mathematical modeling results developed in the paper are verified by laboratory 
test results. 

The paper is organised as follows. In Sect.2, the prototype of e-skin is pre- 
sented. In Sect. 3, finite element method and software toolchain is presented as 
well as the model of e-skin and the results obtained. Section 4 displays the effect 
of modelling results comparison with measurements. Finally, Sect. 5 provides the 
summary and further investigation proposal. 


2 Prototype of an E-Skin 


The prototype e-skin model used to measure the proximity of human body parts 
is described in [6]. E-skin consists of two layers of flexible foil 1. One has comb 
electrodes made of conductive material. On the second layer, resistive touch 
fields are made based on graphene. Touch estimation is based on measuring the 
resistance of the variable touch fields in relation to the touch pressure exerted 
on them. In [6] the principle of operation of the approximation estimation was 
implemented only with the conductive layer acting as one of the plates of an 
open capacitor. The second cover is a copper reference plate. The capacity of 
this setup changes when there is an object nearby with a different permittivity 
than the surrounding air. This change is measured using a custom electronic 
driver based on a rectangular signal generator. 

In order to confirm the quality of the developed toolchain for modelling 
the functionality of estimating the object’s proximity to the e-skin, a simplified 
model was proposed. A simplified e-skin model was built of two 110 x 140 x 
1.6mm plates placed on a plane. The plates, together with the surrounding air 
as a dielectric, constituted an open capacitor, similar to that described in [6]. 
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a) 


Fig. 1. a) Design principle for FSR matrix of size 2 x 2 [5]; b) the e-skin [6]: 1 - a 
conductive layer of comb electrodes printed on plastic foil, 2 - FSR sensors arranged 
in a rectangular pattern placed on a plastic foil. 


3 Finite Element Method for Simplified E-Skin Modelling 


Finite element method (FEM) [7-9], along with finite difference method (FDM) 
and finite-volume method (FVM), is one of numerical methods for solving dif- 
ferential equations numerically, which were originally defined on meshes of data 
points. The idea of mesh is to split the computational domain into a number of 
simple geometric elements, usually triangles (for 2D) or tetrahedral (for 3D), and 
to approximate the solution function over each element with the weighted resid- 
ual concept. Finally the global solution is obtained by combining the individual 
solutions. The origins of FEM date back to the early 1960s, and its first appli- 
cation was the analysis of aircraft structures. FEM has been rapidly developed 
over the years and nowadays it is considered to be the most flexible method [10]. 
It can be used for a wide range of numerical problems in hydraulic, mechanical, 
aeronautical [11] and electrical engineering [12] for solving problems involving 
irregular geometries and steep gradients. 


3.1 Toolchain 


Due to the huge popularity of FEM, the market offers many packages for FEM 
analysis. They differ in terms of their features, terms of operation, functionality 
and areas of application. In case of this project, the most important was the 
division of tools into commercial and open-source. Commercial packages enable 
completing the entire project within one environment, but the cost of the license 
is usually high. When deciding to use free solutions, it is often necessary to select 
separate software for the implementation of each of the individual modelling 
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stages: defining the geometry of the system, creating a mesh, performing the 
FEM analysis and graphical visualisation of the results. Figure 2a) shows the 
developed toolchain which consists of OCTAVE!, NETGEN?, ELMERFEM? 
and PARAVIEW*. 


control meshing 


Shaft geometry 


.geo file 


NETGEN 


Elmer mesh file 


FEM modelling 


„dat file „sif file (b) 


Octave }¢-———_ ElmerSolver 


visualisation 


ParaView 


| Results in fugures 


(a) (c) 


Fig. 2. Toolchain (a) for sensor FEM modelling with open-source software; e-skin math- 
ematical model, 1-water-box, 2,3-capacitor plates: (b) geometry (NETGEN), (c) mesh 
(ELMER). 


NETGEN is an automatic 3D tetrahedral mesh generator that contains mod- 
ules for mesh optimisation and hierarchical mesh refinement [13]. 

ELMER FEM is a finite element software for the numerical solution of partial 
differential equations and multiphysical problems [14], enabling the modeling of 
mechanical, thermal, electrical, and magnetic systems. 

ParaView is a data visualization application that enables qualitative and 
quantitative analysis techniques [15]. The data obtained were explored interac- 
tively in 3D, but it is also possible to use batch processing. 

OCTAVE is a high-level programming language developed under the GNU 
Project for scientific computing and numerical computation. It may be also used 


1 https://octave.org/. 

2 https://ngsolve.org/. 

3 http://www.elmerfem.org/blog/. 
4 paraview.org. 
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for batch processing and in presented toolchain it was used to automate a series 
of calculations. 

All programs were run in the LINUX MINT 19.1 operating system. Although 
each of them can also be used in Windows, they have been optimised for LINUX. 

Data flow between applications was carried out using text files. This solution 
allowed for the validation of the results at each stage of modelling and batch 
processing. The automation of the calculations was carried out by OCTAVE, 
which generated batch .geo files for consecutive variants of the system’s geometry, 
ran meshing in NETGEN, and then FEM computation in ELMER FEM. Finally, 
the modelling results were processed in the OCTAVE environment to create 
graphs and in PARAVIEW, which was used to visualise the distribution of the 
magnetic field. 


3.2 Modelling 


The effect of mesh generated from GEO geometry import is displayed in Fig. 
2 b). The model is set up of two capacitor plates of dimensions 110 x 140 
x 1.6mm placed on a plane, a 80 x 80 x 80mm cuboid of water as an object 
approaching an open capacitor. Those three elements were placed inside a sphere 
of atmospheric air with a diameter of 500 mm. The next variants of the modelled 
system differed in the distance between the water box and the plane of the 
capacitor plates. This distance varied from 1 to 50mm with an increment of 
1 mm. 

Fach of the 50 .geo models was meshed in the NETGEN batch processing 
environment. For the project described, Delaunay tetrahedral meshing mod- 
ules [16] were used. An example of the meshing result is presented in 2c). The 
maximum height of a tetrahedral element was set up to 0.005m, hence each 
final mesh consisted of 25553 points, 138243 elements, which established 21648 
surface elements. Then each of the meshes was saved in a format dedicated for 
ELMER FEM. The ELMER FEM environment provides an opportunity to solve 
Maxwell's Eqs. (1-4) that govern the macroscopic electromagnetic theory. 


v.D = p (1) 

v.B = 0 (2) 
M OB 

5 + OD 


where: V denotes the three-dimensional gradient operator, D is the electric flux 
density, B is the magnetic flux density and E is the electric field strength, and 
H is the magnetic field strength. 
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For linear materials the fields and fluxes are simply related with Eqs. (5) and 


(6). 


B = „A (5) 


D = Æ (6) 


where the permittivity € = €pe, is defined through the permittivity of vacuum €o 
and the relative permittivity of the material e,. In the modelled stationary case 
the electric field may be expressed with a help of an electric scalar potential ¢ 
according to Eq. (7). 

E = -Vó (7) 


Assuming linear material law and using the Eq. (1) gives Eq. (8) 
—V:eV$ = p (8) 


where p is the charge density. The energy density of the field is expressed by Eq. 
(9). 


1 


e=58-D= levo? (9) 


Thus the total energy of the field may be computed from Eq. (10). 


E= TEKL (10) 


where 9 is any volume with closed boundary surface. As there is only one poten- 
tial difference ¢ present then the capacitance C may be computed from (11). 


2E 

C= E (11) 
The StatElecSolve module by Leila Puska, Antti Pursula, Peter Raback was 
used for this purpose. This module enables for calculating the electrostatic poten- 
tial in a linear dielectric material and a conductive medium. Depending on the 
potential, different domain variables can be calculated as well as physical param- 
eters such as capacitance. The mesh definition together with the .sif file defin- 
ing the used constants, solvers and simulation boundary conditions constituted 
batch files for individual models calculated in ElmerSolver which is the mod- 
ule of ELMER FEM. In this project solvers using conjugate gradient-oriented 
algorithms were used. For electric potential Dirichlet boundary conditions were 
used to define the value of the potential on boundaries of capacitor plates (0 V 
and 3 V). The simulation results were saved both in .dat files (containing indi- 
vidual values of electric capacity of the system) and in .vtu files dedicated to 

PARAVIEW 3. 
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Fig. 3. E-skin modelling visualisation (PARAVIEW) - the electric potential distribu- 
tion. 


4 Tests 


The section contains a description of the tests performed to verify the correctness 
of the simulations’ results. For this purpose, a robotic workstation was developed 
using the Fanuc M10iA industrial robot. It goes on to describe the comparison 
of the results obtained by simulation and measurement. 


4.1 Experimental Setup 


For initial tests, a plastic 80 x 80 x 80 mm thin-walled box filled with water was 
proposed as an object approaching an open capacitor. The test object was sus- 
pended over a simplified e-skin model by attaching it with thin nylon lines to the 
robot’s handle. This allowed precise positioning of the object’s distance from the 
e-skin and eliminated the influence of robotic components on the measurement. 
Preliminary measurements were used to determine the number n of periods of the 
signal generated by the e-skin electronic driver depending on the distance of the 
test object from the e-skin. A dedicated electronic circuit described in [6] was used 
to determine this value. The circuit indirectly measures the frequency f depend- 
ing on the capacitance of the open capacitor based on a number n counted at a 
fixed time T of 720 us. As a result, each measurement point of the Fig. 4a) plot 
was determined from about 10,000 of recorded measurements of n values. 


4.2 Comparison of the Measurement Results with the Simulation 
Model 


The simplified e-skin model was modelled and simulated using the toolchain 
described in Sect. 3. As a result of the simulation, the model presented the capac- 
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itance of an open capacitor as a function of the distance of a container of water 
near it. In order to compare the measurement results with the simulation results, 
it was necessary to calculate the capacitance of the open capacitor based on the 
measured value of n. The relationship is described by the formula (12). 


== 
"R f(d) 


where: C - capacity [pF] of the open capacitor, d - distance [mm] to the object 
being approached, f = n/T - frequency [1/s] of the signal generated by the 
electronic system, k- experimental constant coefficient characterising the system, 
R- resistance characterising the system, k/ R = 0,00001 [pF/s]. 

The measurement results had the same character of changes as those obtained 
in simulations, but the dynamics of their changes differed significantly. Therefore, 
a linear function was developed to transform the measurements of the physical 
system to match the simulation results. The transformation was made according 
to formula (13). 


Cd) (12) 


Cild) =a: (C(d) +b) +c, (13) 


where: a = 50.0,b = —3.0,c = —7.5 - constant transformation coefficients; a is 
unitless; b and c in [pF]. 

Finally, a comparison of the simulation results obtained and the transformed 
measurement data is shown in Fig. 4b). 
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Fig. 4. (a) Test stand proximity measurements; (b) proximity simulation verification. 


5 Summary 


The mathematical modelling proposal presented in the article reduces the costs 
of creating prototypes and testing them in the laboratory. This enables faster 
acquisition of knowledge and reduced material consumption for producing proto- 
types for research. The developed toolchain allows for the optimisation of sensors 
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used in robotics (and not only) without the need to incur the costs of software 
licenses for FEM. Although all the described software is free, there is no tech- 
nical limitation to using it for more complicated geometry and for many more 
solvers for modelling other physical phenomena for the same project. ELMER 
FEM enables parallel computation - it historically used message passing but 
now is developed towards multithreading using OpenMP. ELMER FEM can be 
used on multiple cores, and it also supports several ways of geometry partition- 
ing. The only inconvenience of using the described environment is the need to 
master several tools and transfer files between them. However, using four dif- 
ferent applications makes it possible to optimise the workflow through parallel 
data processing on hardware with adequate computing power and an operat- 
ing system that works better with a particular application. As the developed 
toolchain is free, it significantly reduces the optimisation and implementation 
costs of the developed solutions. The software used to build the toolchain is also 
open source, which allows for code validation and modification required by the 
newly developed materials. Commercial solutions do not offer such possibilities. 
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Abstract. Every year, collaborative robots get closer to humans and 
cooperation with them takes place not only in industrial spaces, where 
specialized employees work with them, but also people who do not have 
knowledge in the field of engineering and robotics. Therefore, great atten- 
tion is paid to safety in the cooperation of robots and humans. In addi- 
tion, the aspect of ethics and their ethical behavior towards a human 
co-worker, companion or petitioner is more and more often taken into 
account. Knowledge of potential safety hazards is important to secure 
safety early in robots’ design and development process. Therefore secu- 
rity is one of main issues raised in the article. The most important safety 
standards from the point of view of collaborative robotics are presented. 
In the article described example of cobots acting increasingly role as 
members of our society. Access to them is becoming more and more com- 
mon - they are household members, waiters or airport staff. Presented 
in the paper issue of ethics in reference to robots and AI are becoming 
increasingly significant impact on human. It deals with topics of physical 
and ethical safety in cooperation between humans and robots. Reference 
has been made to the safety standards. Due to proximity of technology 
in humans lives, access to them, and even dependence on them, that 
issue was particularly emphasized by the author. The paper is source of 
references to considerations of human safety in robotized environments 
and ethics in robotics applications. 


Keywords: Collaborative robotics - Safety - Ethics - Robot Safety 
Standards 


1 Introduction 


Robotics has been playing an important role in industry for over 40 years. As 
statistics show, the development in the robotics industry is still very dynamic 
[14]. Information from [15] reports on the market situation after the pandemic 
indicate that the pandemic period, although initially associated with the sus- 
pension or postponement of some investments, was overall equally beneficial for 
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the industrial, logistics and other sectors. The interest in production automa- 
tion technologies, in particular digital solutions, has increased, and openness to 
innovation has increased. 

Increasingly, robots are used in space exploration, education, healthcare, 
intralogistics, and agriculture [3]. In recent years, the distance between humans 
and robots has narrowed. Positive changes in the perception of robots are notice- 
able in the research carried out on the acceptance of human-robot cooperation 
[12]. There is a noticeable change in attitude compared to the results of the 
[13,18] study conducted a few years ago on behaviour, emotions and attitudes 
towards robots. The research, described in the article [17], showed that assigning 
human characteristics to robots causes negative feelings due to a strong belief in 
the uniqueness of human nature. The negative attitude towards interaction with 
robots indicated the lack of acceptance and readiness for technological changes 
in society. 

Collaborative robots (so-called Cobots) (Fig.1) are a relatively new type 
of robots, more and more commonly used in production plants, laboratories, 
warehouses, etc. Robots also play the role of waiters, couriers and guides. The 
cooperating robots differ in terms of construction, weight, and tools with which 
they are equipped. This is due to the increased emphasis on safety issues [12] 
that must be ensured when the human and robot are in the same workspace. 
For this reason, it is so important to change people’s approach to collaborating 
with a robot, as the goal is to share a workspace. A robot that works safely 
between (or with) people can improve product flow and increase production by 
automating new spaces and processes. The combination is the most reliable and 
effective combination of a robot and a human. 


Fig. 1. Cobots 
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A great deal is currently being said about collaborative robots that work hand 
in hand with humans without any additional protective measures. The robots 
were designed to perform heavy tasks and it was definitely dangerous to enter 
the working area of a robot while it was manipulating a heavy element. Trends in 
robotization have been changing for several years (Fig. 2). The significant differ- 
ence in the way of operation between cobots and traditional robots is speed they 
work with. Whereas traditional robots are designed to operate at a high level of 
speed and accuracy, cobots are intended to be safe. Moreover they are easy pro- 
grammable and use. Due to high-speed work of the industrial robot arm, they are 
isolated from the workspace when operating to protect workers from dangerous, 
fast moving parts (Fig. 2 on the right). Collaborative robots are designed to work 
alongside people without barriers or fences, what is presented on the Fig. 2 (on 
the left). The cobots are able to immobilize themselves with the slightest touch 
preventing injury or any danger to nearby people, thanks to built-in sophisticated 
sensors. They assist human co-workers, accelerate tasks or assume monotonous 
and tedious work, leaving more complicated tasks to the human workers. Simul- 
taneously, they are much easier to set up. Industrial robots are proper for large 
companies that production is standardized and repeatable. Smaller companies 
can benefit from the flexibility and cost-effectiveness of cobots. 

Robot manufacturers set new safety standards that allow for unprotected 
collaboration. There is a very low level of tolerable risk which is considered 
acceptable when the robot is in the same environment as the worker. 


Fig. 2. Cobots vs. traditional Industrial Robots [source: https: //www.kuka.com] 


The paper aims to discuss the topics of physical and ethical safety in coopera- 
tion between humans and robots. In the dynamically developing field of robotics, 
the issue of safety is of particular importance. The observed trend is to construct 
robots that will be safe for humans. Due to the fact that cobots are intended 
to cooperate in an environment where accidental contact with it may occur. 
The essential is exploration of the potential hazards of co-operation while robots 
interact with humans closely. On the other hand, looking at the potential of 
AI and robotics, new technologies should be implemented in a sustainable and 
socially ethical way. 
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The paper is organised as follows. In Sect. 2, safety collaborative robots and 
human, collected information about safety standards in this area, is presented. 
In Sect.3, ethical issues are described in reference to the first British Ethical 
Standard. Finally, Sect. 4 provides the summary and further development con- 
sideration. 


2 Safety 


Due to the rapid development of collaborative robots on the market, their 
increased market share (in production spaces and at trade fairs), their pres- 
ence is no longer unusual. The features that distinguish them from traditional 
robots are: 


— a light structure, 

— compact design, 

— rounded edges, 

— hidden wiring, 

— different tooling than in case of traditional robots, 
— different functionality. 


Collaborative robots are equipped with a number of solutions that allow for 
the detection and response to a collision with operators and elements of the 
environment [3]. They move slower than traditional machines, which is also due 
to safety reasons. The possibility of easier programming is also a big difference. 
This opens up new possibilities for creating completely new scenarios of robo- 
tization applications, as well as increasing production efficiency. In the era of 
the employee market or due to other unexpected situations, such as a pandemic, 
cobots are more conducive to maintaining stability and continuing work in work- 
places. This is a strong argument for introducing robotic solutions. The goal is 
to develop companies and change the tasks performed by the production staff, 
rather than replacing them. 


2.1 Safety Standards 


Safety standards are developed to protect personnel from the risks associated 
with the nature of their work and their place of employment. They are formulated 
in such a way as to impose minimal restrictions or interference with the level of 
services provided. 

The selection of protective measures is related to all elements of the robotic 
system, i.e. the type and specificity of the robot itself, connection with other 
machines, equipment with which the robot and machines are provided [1]. Proper 
selection of the system components is essential for the work to be performed 
correctly, effectively and for the necessary changes to be made. 

The risk assessment of a robot and robotic application is based on the 
assumptions set out in the PN-EN ISO 12100 [1] standard. Due to the different 
nature of the hazards associated with various applications of industrial robots, 
the ISO 10218 standard has been divided into two parts [5,6]. 
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2.2 Collaborative Robot 


It is not efficient for the production process when a robot inside a cell fenced 
off from a human being during its active work, has to be stopped in the event 
that the operation requires human participation. This is the case with standard 
robotic applications. Cobots enable the use in industry of repetitive and efficient 
work of machines with individual skills and experience of the operator. This is 
important because people are able to approach the solution of the problem and 
task in a non-standard way, while cobots show high endurance, precision and 
strength at work. In other, non-industrial solutions, robots can even approach 
humans directly, being a waiter or a nurse. The common feature of all cobots is 
their safety in contact with humans. 

The challenge for cobot applications is to work without protective fences. 
The boundaries between the workspace of people and cooperating machines 
are completely or significantly limited. An additional disadvantage that must 
be taken into account is the unpredictable human movements. It is extremely 
difficult in the context of calculations related to speed, reflexes or unexpected 
entry into the work area of additional people. Standards for industrial robots 
ISO 10218-1 [5] and ISO 10218-2 [6] have been in force since 2011. The techni- 
cal specification ISO/TS 15066 [8] was created to supplement and regulate the 
safety requirements of cooperating systems robots and their work environment. 
It is the world’s first technical specification that focuses on the safety aspects 
of human cooperation with collaborative robots. It provides detailed guidelines 
and tips for conducting a risk analysis for collaborative robot applications. The 
idea of allowing a cobot to come into direct contact with a human when no 
pain or injury is caused, prompted the creation of a new technical standard. 
The ISO/TS 15066 [8] standard defines for the first time the limits of speed and 
power allowed during robot-human cooperation and the risk assessment of the 
application. 

Research on the e-skin prototype [9,10] could open up new possibilities at 
the human machine interface (HMI) level. The use of e-skin will increase the 
security of cooperation and improve the interface by collecting data, which will 
be processed into information about the type of contact. Studying the topic may 
help to read the emotions of people working with the robot [19]. 


2.3 Intralogistic Robotics 


Cobots, or collaborative robots, are a large group of AGV (Automated Guided 
Vehicles) products. Analysing the development of robotics and safety in 
human-robot collaboration, we should also mention ARM-class mobile robots 
(Autonomous mobile robots). They are another type of AGV robots. Their main 
task is to transport products using advanced technology and are equipped with 
artificial intelligence [20]. 

Workers’ concerns about AGV are similar to those about cobots. Even more 
visible and emphasised are the users’ concerns about security in this case. AGVs 
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Fig. 3. AMR, VersaBox, [source: https: //versabox.eu] 


are equipped with sensors and other security devices. The security system charac- 
teristic of AMR class robots is best defined in the harmonised standards PN-EN 
1525 [16], ISO 3691-4 [7]. The regulations mainly concern the speed of move- 
ment of the AMR robot (Fig.3): and the required security systems with which 
it should be equipped. The specific requirements are related to the two spaces 
in which the robot can work. 


3 Ethics 


In industry, robots commonly build, arrange, rearrange, transport, pack, and 
inspect things. For a long time, they have also been a visible support in medicine, 
e.g. they perform surgical procedures and dispense prescription drugs in phar- 
macies. Both medical robots and social robots establishing contacts with people 
evoke emotions and build relationships with the user. Referring to the [17,18] 
experiment on attitudes towards robots using the NARS scale, it can be con- 
cluded that a positive or negative attitude towards robots can be manifested 
through emotions, evaluations and reactions, and these have an impact on human 
well-being. These potential ethical threats are recognised as having a stronger 
and deeper impact than physical threats, so it is important to consider various 
ethical harms and countermeasures. 

Like most robots, social robots, cobots, AMR. use artificial intelligence to 
decide what movement to make in response to information received through 
cameras and other sensors. The ability to react in a way that seems close to 
human behavior has been trained through research. It forms a perception that 
is social and emotional intelligence. It studies how people can read thoughts, 
feelings and even touch [9,10]. 
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3.1 First Ethic Standard 


The British Standards Institution (BSI) has published the world's first ethical 
standard for the design, production, sale and use of social robots. 

Robots and robotic devices are increasingly used in industrial and non- 
industrial environments. The psychological factor is also taken into account as it 
takes into account how they affect the people with whom they share space and 
tasks. In addition to speech recognition, the use of vision systems to recognise 
emotions, new methods are also used. The distance between human and robot 
functioning is shortened (Fig. 4). 


Fig. 4. RCH patient, Miles, working with NAO [source: Alvin Aquino/RCH] 


BS 8611 [2] provides guidance on identifying the potential ethical harm of 
the growing number of robots and autonomous systems used in everyday life. 
The standard also provides additional guidance to eliminate or reduce the risks 
associated with these ethical risks to an acceptable level. The British Standard 
addresses ethical issues in social, application, commercial/financial and environ- 
mental terms. It includes four pages of examples of ethical risks, related ethical 
risks and mitigation measures (e.g. Table 1). Helpful comments are also included, 
with examples of how mitigation and reduction of negative impacts can be veri- 
fied. The standard collects the requirements and guidelines for the design of the 
robot, the protective measures used for protection and the method of developing 
clear information for the user. 

In [2] section of ethical hazard identification reference is made to groups of 
humans or animals that are likely to be affected by a new robot or application, 
despite the fact the definitions in clause do not refer specifically to animals. In 
subsequent subsection other standards for risk assessment are also referred to 
namely BS EN ISO 12100:2010 for machines, and BS EN ISO 14971 for medical 
devices. The conclusion of the risk assessment, BS 8611 stated that ‘As a general 
principal, the ethical risk of a robot should not be higher than the risk of a human 
operator performing the same action.’ 

Issues related to the idea of risk assessment are nothing new in safety stan- 
dards. However, the method of assessing robots in terms of ethical risks is an 
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Table 1. Example of ethical risk associated with societal and the robot application [2] 


Ethical issue | Ethical hazard Ethical risk Mitigation Comment Verification / Validation 
Societal Loss of trust Robot no longer | Design to ensure | In unexpected User validation 
(human robot) used or is reliability in behaviour 
misused, abused | behavior occurs, ensure 
traceability to 
help explain 
what happened 
Application | Inappropriate Malign or Model of Design robot to | Software verification; 
‘trust’ of a inadequate appropriate reach safe state expert guidance 
human by a human control human control with respect to 
robot any other tasks 
the robot is 
executing 


interesting addition to the set of tools that a designer, a robotics should use. 
The IEEE Standard Association, in 2016, launched a global initiative on ethics 
in autonomous systems and artificial intelligence. The result of a global initia- 
tive by the IEEE Standards Association was the document Ethically Aligned 
Design (EAD), which was based on information gathered from the wider public. 
In subsequent editions, the EAD documents expand on ethical issues and related 
recommendations. 

Faced the ethical dilemmas in robotics, likewise, the Als matter should be 
concerned [4]. The matter refers to role of AIs behave in Society, e.g. computer 
programs capable of making decisions for approving home loans. Including Als 
in daily decisions process for numerous profound and important questions it is 
an increasingly common practice. Robots are becoming increasingly involved in 
our daily lives. In [11,21] is extensively discussed AI and robotics of certain roles 
that ethics plays in the prosperity of humanity. The author [11] suggest that 
trust and cooperation play key role in this process. 


4 Summary 


The article presents the most important issues related to the development of 
robotics, because more and more challenges are faced by collaborative, social 
robots, AMR. They must be more reliable due to their responsible tasks, as well 
as a closer or even direct contact with people. The most important common 
feature of all these robots is their safe contact with humans. The presence of 
a robot forces the improvement of order, work organisation and changes the 
approach to the use of common space. The result is greater efficiency and greater 
work safety. 

The basic role of safety systems in machine control systems is to protect 
human health and life. In the face of the dissemination of the Industry 4.0 strat- 
egy, many machines receive new functionalities. Due to the dynamic development 
of robotics, the changing nature of cooperation, new ISO/FDIS 10218-1 and - 
2 standards are already being developed, which are to eventually replace ISO 
10218-1: 2011 and 10218-2:2011. 
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Another issue that is ethics in robotics is a major challenge to create a new 
generation of robotics standards. Ethical standards are a big step forward for 
people to trust new technologies. Without ethical standards, it will be difficult 
to gain universal trust and acceptance among the general public. Furthermore, 
ethical concerns can be incorporated into learning, planning and control algo- 
rithms. The issue of ethics, comparably to the safety, are currently considered in 
a very wide range, on various levels and in various fields of the new technologies. 

The last important point in the development of technology in the coming 
years will be the mutual communication between the robot and safety devices, 
which will enable information about the exact position of the robot, as well as 
the human. Currently, the robot knows where it is, but does not know who or 
what is approaching it until it collides. The safety systems also do not recognize 
the exact position of the robot, nor do they know exactly what obstacle they are 
dealing with. The situation would be completely different if the robot could com- 
municate with security systems and both knew their position and what obstacle 
was approaching it. This is the next step in the development of collaborative 
robots. 
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Abstract. In the face of the Fourth Industrial Revolution (Industry 
4.0), collaborative robots have become one of the key pillars of devel- 
opment. Thanks to their sensors, they allow increased flexibility and 
safety while working in a shared space with humans. One such sensor is 
the electronic skin (e-skin), which enhances human-robot collaboration 
through physical contact. This paper presents the developed versatile 
robotic workstation that allows, among other things, the calibration of 
e-skin touch measurements. In particular, the problems encountered with 
the use of a standard industrial robot are presented and ways to solve 
them are discussed. The presented approach allows the automatic acqui- 
sition of calibration measurements of e-skin sensors within the reach of 
the robot. 


Keywords: Collaborative robotics - Electronic skin * Test stand - 
Force-sensitive resistors - Robot Operating System 


1 Introduction 


For more than three decades [1] researchers around the world have supported the 
development of electronic skin (e-skin) technology. In robotics, the most impor- 
tant potential applications of e-skin are to improve the safety of human-machine 
collaboration and the agility of the robot [2]. E-skin devices currently being 
developed worldwide often allow the estimation of the value and location of tac- 
tile pressure [3]. There are also an increasing number of applications that allow 
the measurement of the proximity of human body parts to the e-skin surface [4]. 
Accurate testing and calibration is particularly important in light of the devel- 
opment in recent years of low-cost e-skin manufacturing techniques [5]. As the 
number of e-skin sensors increases, this problem becomes more relevant. As an 
example, previous study [6] performed an effective semi-automated calibration 
of e-skin, where the matrix studied had 16 rows and 18 columns, giving a total 
of 288 sensors. In order to calibrate and accurately test the e-skin, researchers 
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typically construct a stand to apply a precise pressure to one selected e-skin 
sensor. The most naive approach is presented by [7], where reference touch pres- 
sure is implemented in the form of weights of known mass manually placed at 
selected locations on the surface of the e-skin. One can find more sophisticated 
approaches using XY-axis tables [8] or a Z-axis platform [9] with a load cell. 
In [10], an interesting test rig was described to implement e-skin stretching mea- 
surements using a motor and linear gearbox. In the case of the heterogeneous 
parameters of the e-skin sensors, the calibration procedures carried out on the 
presented benches would be labour-intensive and difficult to repeat in practice. In 
the literature reviewed, no publication was found presenting an e-skin test bench 
allowing universal and automated measurements. This manuscript describes a 
robotic workstation designed for the calibration and testing of e-skin fabricated 
as an array of multiple sensors with heterogeneous measurement parameters. 
The developed stand consists of an industrial standard robot, a reference force 
sensor and a customised tactile tool tip. Thanks to the use of an industrial robot, 
the repeatability of the experiments performed and the ease of automation of 
the measurement process are ensured. Experiments with e-skin performed at 
the stand may include, for example, testing several hundred regularly arranged 
e-skin sensors for their calibration, measuring the reaction to touch in terms of 
normal and lateral forces exerted on the e-skin, testing the estimation of the 
proximity of objects to e-skin skin, and many other scenarios possible with the 
use of an industrial robot. To carry out these tests, it is not necessary to change 
the elements of the robotic workstation significantly, but only to change the 
control programs. The paper describes the solutions used to integrate the robot 
control system, the reference sensor measurement system and the e-skin. Prob- 
lems that may arise in this type of robotic application and the proposed way to 
solve them are discussed. 

The paper is organised as follows. In Sect.2, the developed robotic work- 
station and its software integration is presented. In Sect.3, the example mea- 
surements results are given. Finally, Sect.4 provides the summary and further 
investigation proposal. 


2 Developed Robotic Workstation 


The first part of this section describes the developed robotic workstation in 
terms of hardware. The use of an industrial robot as one of the components has 
a number of significant advantages. First and foremost, it allows universal use 
of the developed stand - it enables automation of the measurement acquisition 
process, ensures repeatability, precise setting of the pressure direction and, for 
example, testing the proximity of various types of objects to the e-skin due to 
the possibility of precise positioning of the robot. The next part of the section 
presents the workstation in terms of software integration. This task was the 
main challenge in developing the workstation. In order to ensure the required 
functionality of the developed robotic workstation and the possibility of using it, 
for example, for calibration, proximity testing, tactile touch testing, and other e- 
skin tests, it is necessary to ensure simultaneous robot movement and recording: 
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tactile pressure measurements from the e-skin, measurements of the force exerted 
from reference device and TCP position measurements of the industrial robot. 
Measurements from different components should be synchronized in time. This 
chapter presents an exemplary e-skin test performed on the proposed workstation 
that meets all the above requirements. 


2.1 Hardware Setup 


The main components of the robotic workstation are a Fanuc LR Mate 200iC 
manipulator with a Mate R-30iA control cabinet (CC), an OnRobot Hex-e 6- 
axis reference device with a controller for measuring contact force, and the e-skin 
with a custom controller [3]. The central device integrating the measurements 
and communication with the devices is a general-purpose PC. Figure 1 shows 
pictures of the robotic workstation prepared for data acquisition. 

As shown in Fig. 1, the Hex-e device (2) was attached to the end of the Fanuc 
robot kinematic chain (1). A custom tool tip with a compliant element (3) was 
then attached to the Hex-e. The use of the compliant element allows for a slower 
build-up of contact force with the robot’s standard positional control, and is 
an important part of the functionality of testing the tactile parameters of the 
e-skin. This is particularly important for highly sensitive tactile sensors (e.g. 
graphene-based). The e-skin (5) and its measurement controller (4) were placed 
within reach of the robot. 

For control and measurement acquisition, all devices (1, 2 and 5) were con- 
nected to a PC using the appropriate drivers. A USB port was used for the 
e-skin, while the Fanuc robot and Hex-e device were connected via Ethernet 
to two separate network cards. The general connection diagram of the robotic 
workstation devices is shown in Fig. 2. 


2.2 Software Integration 


In order to meet the measurement objectives associated with the station, time- 
synchronised acquisition of measurements from the e-skin, the Hex-e device and 
the position recording of the Fanuc robot is necessary. A major problem in 
automating measurements on a robotic workstation was the software integration 
of communication with component devices. The following part of the manuscript 
shows how to calibrate the coordinate systems of an industrial robot and how to 
record the robot’s TCP position while integrated with the measurement systems 
of the e-skin and Hex-e reference device. 


Robot Calibration. Before using the robotic workstation, it is necessary to 
define the tool and user frames for the manipulator. The tool coordinate frame 
is best defined so that the TCP (Tool Center Point) is unambiguous with the 
tactile tool tip being used. The user frame was then defined so that the origin of 
the coordinate frame was defined at one corner of the e-skin, while the direction 
of the X and Y axes are consistent with the columns and rows of the e-skin 
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Fig. 1. Illustrative representation of the developed robotic workstation [6]: 1. LRMate 
200iC manipulator, 2. reference e-Hex sensor, 3. robot tool, 4. e-skin driver, 5. e-skin. 


sensors. The tool and user coordinate frames were defined using three-point 
methods. With regard to the tool frame, it is important to easily physically 
identify the physical touch point of the tool tip. With regard to the user frame, 
it is necessary to identify physically with TCP the three points associated with 
the e-skin: the point defining the center of the corner sensor of the e-skin, the 
center of any sensor located in the x-axis direction, and any point located in the 
x-y plane of the e-skin. It is worth emphasizing that the TCP and the centers of 
the selected sensors should be easy to visually identify and available for physical 
touch. 
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Force and torque OnRobot HEX force 
sensor driver and torque sensor 


R-30iA FANUC LR Mate 
controller 200iC manipulator 


Computer 


Fig. 2. Connection diagram of developed robotic workstation. [6]. 


TCP Position Acquisition. For the purpose of communicating with the Fanuc 
robot and recording its position, packages of the ROS! environment in the Indigo 
version running on the Linux distribution Kubuntu were used. For integration 
with the Fanuc robot, ROS-Industrial libraries were used. The process of running 
ROS for the Fanuc robot is complicated. In addition to the standard installation 
of ROS on a PC, it is necessary to install packages designed to run on the Fanuc 
CC. The installation process is based on compiling and loading the files into the 
CC with the Fanuc RoboGuide v7.70 (v7.70P/53 7DA7/53) software. 

After network parameters configuration default ROS-Industrial package 
nodes allow to acquire only part of the status data from the Fanuc CC. This 
data includes information about joints angular positions. In order to capture 
the current position of the robot's TCP, it is necessary to calculate it based on 
the kinematic parameters of the robot. In addition it is necessary to manually 
publish the custom tool frame tf transformation. 


Hex-e and E-skin Acquisition. To implement measurements from the Hex- 
e device, a dedicated device controller was used along with a provided sample 
program in C++ implemented as a ROS node. In terms of measurements from 
the e-skin, the exact details of the measurement controller are described, for 
example, in [3]. From the programming perspective, multi-threaded integration 
of measurements with the ROS node responsible for measurements from Hex-e 
was implemented. 


1 https://www.ros.org/. 
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Example Application. The example developed application is designed to col- 
lect the data necessary for e-skin calibration. To do this, it is necessary to exert 
pressure on successive e-skin sensors and simultaneously record measurements 
from the e-skin and the reference Hex-e sensor. A recording of the robot’s position 
is not crucial for determining calibration, but it can be helpful in distinguish- 
ing between sensors, the loading and unloading phases of the exerted pressure, 
and thus in analyzing the hysteresis of e-skin sensors. The developed application 
generally consists of a PC-based ROS server program written in Python (1), a 
robot motion program (2) and ROS client for the robot both running on the CC 
(3), and a PC ROS node responsible for data collection (4). It is worth noting 
that there is a limitation on the number of simultaneously running programs 
in the robot CC. In order to simultaneously run the robot movement using (2) 
and the ROS client (3), it is necessary to set the “User Tasks” parameter in 
the CC configuration high enough. The robot’s motion program (2) is respon- 
sible for executing a repetitive pressure trajectory on successive e-skin sensors. 
A ROS client (3) is additionally run on the robot’s CC, responsible for sending 
messages about the robot’s current state to a server (1) running on a PC. The 
robot’s state, depending on the needs, can describe the robot’s position or, for 
example, operating states such as: downward motion, upward motion or travel 
between sensors. The ROS server on the PC (1) receives the robot status mes- 
sages sent by the robot’s client program (3) and publishes a message on the 
corresponding ROS topic to record or not the measurement data. This is espe- 
cially important to avoid large volumes of unnecessary measurement data being 
recorded. The node responsible for data recording (4) listens for messages on 
the ROS topic and starts recording measurement data of the e-skin, the Hex-e 
reference device and the robot’s position into files at appropriate times. 


3 Example Results 


Figure 3 shows selected measurement results from the operation of an example 
application on the developed robotic workstation. It is worth noting that, for 
example, data from the e-skin and the Hex-e device were acquired with differ- 
ent sampling frequencies: approximately 50 samples/s for the e-skin sensor and 
approximately 10 samples/s for the Hex-e sensor. Before using these measure- 
ments, it is necessary to solve the problem of different acquisition frequencies, 
e.g. by resampling. Figure 4 shows data from the selected sensor pressed during 
the test. 
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Fig. 3. Simultaneous acquisition of measurements from e-skin, Hex-e and Fanuc robot 
for five e-skin sensors. 
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Fig. 4. Simultaneous acquisition of measurements from e-skin, Hex-e and Fanuc robot 
for one e-skin sensor. 


4 Summary 


The manuscript describes the hardware and software framework for a e-skin 
versatile robotic workstation based on a standard Fanuc industrial robot. Using 
the example of the e-skin test described in Sect.2.2, it was shown that the 
proposed workstation meets all the requirements to provide the required e-skin 
testing functionality in various scenarios. All the devices included in the station 
were integrated and work with the use of the PC and ROS programming system. 
Functional tests on the workstation showed that it could be used, for example, to 


Versatile Robotic Workstation for Electronic Skin - Problems and Solutions 277 


calibrate the normal pressure of e-skin sensors characteristic. The application of 
the workstation, however, is much more versatile and can be used, for example, 
for testing sheer touch force e-skin reaction [3] or proximity estimation [4]. 

As outlined in the manuscript, the use of a standard industrial robot, e.g. 
Fanuc, is possible; however, its integration with the standard ROS programming 
system used increasingly in robotics development is a problem on its own. In 
the future, it is planned to replace Fanuc robot with the ES5 from EasyRobots 
company”, which has built-in integration with the ROS system. 

The application of robotization in the field of e-skin testing leads to signifi- 
cant time savings (the acquisition of measurements from several hundred e-skin 
sensors can take several hours [6]) and improves the quality of measurements 
due to the repeatability achieved with an industrial robot. In order to further 
develop the workstation, it is worth solving the problem of manual calibration of 
the robot against the e-skin to further deepen the automation of the workstation 
and the associated benefits. 
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Abstract. Virtual reality applications are becoming more and more popular. In 
addition to apparent uses like providing entertainment, VR applications are finding 
use in fields such as education, engineering, and architecture. The market for 
VR games used in medicine and sports is also thriving. The applications allow 
monitoring of an athlete’s progress, training advanced movements specific to a 
given sport, and are difficult to reproduce during traditional training. A significant 
advantage of this type of solution is the increased safety of the user and, thus, 
a lower risk of injury. The article presents a VR application designed for the 
training of a powerlifting triathlon. This sport consists of three exercises. They are 
performed by both strength athletes and those training in other sports to prepare for 
the season properly. Due to the fact that they are simple multi-joint exercises, they 
fall into the collection of exercises often performed by amateur trainers. Despite 
their significant popularity and the undoubted advantages of performing them, it is 
often observed that they are performed incorrectly, which significantly increases 
the risk of injury. The purpose of the application is to enable safe training and 
learning of correct movement patterns of powerlifting exercises, regardless of the 
user’s level of proficiency. 


Keywords: Powerlifting - VR application - VR in sport 


1 Introduction 


Powerlifting is a discipline that belongs to the strength sports group. It consists of three 
exercises: a squat with a barbell on one’s back, a bench pressing in a supine position, 
and deadlifting [1]. Significantly, exercises specific to this sport are performed not only 
by powerlifters but also by bodybuilders, strongmen, CrossFit practitioners, and during 
rehabilitation [2] (e.g., deadlifting is used to stabilize the spine in cases of discopathy 
[3]). These movements are also treated as a complement during preparatory training 
for all kinds of sports (such as snowboarding, skiing, or cycling) [4]. Due to the large 
number of applications of these exercises and their sophistication, they are associated 
with a significant number of injuries. For triathletes alone, the statistics are 1.0—4.4 
injuries per 1000 h of training [5]. For CrossFit practitioners, the injury statistics are 0.2 
to 18.9 injuries per 1000 h of training, with as many as 8.7% requiring surgery [6]. The 
reasons for this phenomenon are found in too much load, too little rest time between 
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sets, and incorrect technique of performing these exercises. The last reason is important 
because it can be reduced by improving exercisers’ technique and movement patterns. 
This paper proposes a solution based on learning movement patterns using a virtual 
reality application. 

VR applications supporting training and rehabilitation are increasingly common in 
the daily functioning of athletes [7], physiotherapists [8], and ordinary non-specialists. 
They allow both to conduct enjoyable workouts in a diverse environment (which is 
extremely useful, for example, in the physiotherapy of children [9]) and provide imme- 
diate feedback and the ability to monitor their progress [10—12]. In addition, for profes- 
sional athletes, are used to analyze and optimize the movement patterns used [13-16]. 
Furthermore, the argument for the research is the variety of sports and exercises and 
other health and movement fields (such as applications for training surgeons) for which 
applications using virtual reality have already been created to teach [7, 17]. The literature 
data mentions the use of VR technology to assist strength athletes, however, mainly in 
the motivational sphere. Unknown to the authors are solutions for learning the movement 
patterns of powerlifting using VR applications [18]. 


2 VR Application Development 


The application’s main purpose is to allow one to safely learn the exercises included in 
the powerlifting or correct previously acquired, incorrect movement patterns — perform- 
ing exercises without load will not cause injury. The critical element in these exercises 
is the correct handling of the barbell, which was taken into account when creating the 
application. In a significant number of cases, according to specialists, this is a sufficient 
condition to reduce the frequency of injuries resulting from incorrect exercise signifi- 
cantly. This statement is supported by the fact that the guidance of the barbell forces 
the body position of the exerciser. After consulting with a powerlifting coach, this was 
accepted as a sufficient condition for the VR application. 


2.1 VR Application - Selecting the Type of Training 


A VR game was created that met the outlined objectives. The player can choose a specific 
exercise. In the first stage, a video showing the correct performance of the exercise is 
displayed. 

Then the player can choose one of four levels of difficulty. Levels of difficulty differ 
in the number of visible elements that determine the correct trajectory of the barbell and 
additionally allow the player to monitor his progress. 


2.2 VR Application - Proper Training 


Before performing an exercise, the player is informed of the correct starting point. For 
standing exercises (deadlift and squat), it is the virtual barbell’s position (Fig. 1). 

The user’s position on the bench is additionally controlled by the horizontal bench 
press. During the execution of the exercise, the path of the barbell movement is examined, 
and the transition along the correct trajectory is checked, as well as the maintenance of 
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Fig. 1. Squat training station with a barbell, with visible starting points 


the horizontal position of the barbell (required for each trained exercise). In addition, as 
the user performs repetitions, advice from a “virtual coach” appears, including critical 
elements of each exercise. The user performs five repetitions of an exercise. A special 
counter helps him control the number of repetitions. After completing the appropriate 
number of repetitions, the player proceeds to check the results. 


2.3 VR Application - Results and Progress Monitoring 


The results are then assigned to a 5-grade scale and presented to the player in an iconic, 
currently popular, and easy-to-understand manner. The results are displayed in three 
tables. The first gives the user feedback on each of the repetitions for a given exercise, with 
easy-to-understand graphs. The number two board describes how to improve exercise 
performance. The last board graphically depicts the overall averaged training score in a 
humorous way. 


2.4 VR Application - Environment 


The entire training room is located in a mountainous environment (Fig. 2) which is meant 
to make the learning process even more enjoyable. In addition, the app includes moti- 
vating voice comments and elements to increase mental immersion. All these elements 
aim to extend training time by making it more attractive. 
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Fig. 2. Environment of application 


3 Conclusion and Further Steps 


A game was created that allows players to improve their technique in powerlifting 
exercises. At the time of writing, the prototype application has been tested by a strength 
triathlete coach. The testing results are auspicious, and necessary corrections have been 
made, so the application is ready for further research on a larger group of people. In 
addition, the use of VR games for learning or simulating sports [19, 20] and rehabilitation 
[8] allows us to assume that the application has a significant case to meet its goal, which 
is to reduce the number of injuries during training. The prototype application has also 
received positive feedback from physiotherapists. 

The next step of the work will be to conduct tests on a larger number of users 
with different proficiency levels. The tests will be conducted under the supervision of a 
powerlifting coach. The test subjects will be divided into two groups. Each participant 
will get to watch a demonstration video at the beginning. Subjects in group one will 
then perform each of the exercises (at an adequate, safe load for the user's strength) with 
no prior in-game training. Participants in group two will do the in-game training first 
and then the gym training second. At the very end, the progress will be evaluated by the 
trainer. Thus, the real impact of performed exercises in virtual reality on the correctness of 
authentic movement patterns will be analyzed. The impact of the simplifications adopted, 
including analysis of the barbell movement alone without reference to characteristic body 
parts, on the correctness of the training performed will also be examined. The results of 
the study will be described in subsequent articles. 
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Abstract. Alpha-XR Mission conducted by XR Lab PJAIT focused on 
research related to individual and crew well-being and participatory team 
collaboration in ICE (isolated, confined and extreme) conditions. In this 
two-week mission within an analog space habitat, collaboration, objec- 
tive execution and leisure was facilitated and studied by virtual reality 
(VR) tools. The mission commander and first officer, both experienced 
with virtual reality, took part in daily briefings with mission control. 
In the first week the briefings were voice-only conducted via a channel 
on Discord. During the following week last briefings were conducted in 
VR, using Horizon Workrooms. This qualitative pilot study employing 
participatory observation revealed that VR facilitates communication, 
especially on complex problems and experiences, providing the sense of 
emotional connection and shared understanding, that may be lacking in 
audio calls. The study points to the need to further explore VR-facilitated 
communication in high-stake environments as it may improve relation- 
ships, well-being, and communication outcomes. 


Keywords: remote collaboration - virtual reality - communication 


1 Rationale 


In high-stake environments, where status needs to be reported carefully and often 
the mode of communication plays a crucial role. The choice between different 
communication spaces: audio, video or VR, may affect the quality of communi- 
cation, relationships and well-being of the participants. Especially in the context 
of manned space flight communication between the crew and Mission Control 
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(MC) this choice may greatly impact goal completion, as problems encountered 
in space are analysed, simulated and solved by specialized teams on Earth. 

To explore this topic we devised a pilot study to observe how the quality of 
briefings may change if they are moved from the audio space to virtual reality. 
In a two-week simulated Lunar mission we had the commander and the first 
officer take part in daily briefings with MC. In the first week the briefings were 
voice-only conducted via a locked channel on Discord. During the following week 
last briefings were conducted in VR, using Horizon Workrooms. This qualitative 
preliminary study employing participatory observation gathers insights, consid- 
erations and future research directions by exploring such research questions as: 


1. To what extend does the technology, audio transmission and VR constitute 
a barrier to communication? (explored in Sect. 3.1) 

2. What are the differences between the impressions of participants of remote 
audio and VR briefings? (explored in Sect. 3.2) 

3. What is the potential of using VR-communication to improve the well- 
being of people experiencing ICE (isolated, confined and extreme) conditions? 
(explored in Sect. 3.3) 


1.1 The Role and Forms of Communication 


Communication is a form of exchanging information, thoughts and feelings. It 
can be job related, inter-team coordination, or used by mission control to man- 
age the performance of crew members [4]. Communication can be an important 
factor in isolation missions, stabilizing the well-being as a carrier of relations 
with friends, family or a wider group of associates [22]. The level of copres- 
ence during mediated communication (usually higher in rich media like VR) 
influences well-being [21]. Natural communication usually takes place in a mul- 
timodal setting, integrating many senses such as sight, hearing and touch. The 
sender naturally uses those to encode the message not only in the words or the 
way they are uttered, but also in their posture, gestures or emotions that are 
conveyed through facial expressions [1]. Traditional media - like email, telephone 
or videoconference - negatively affect the process by filtering out some informa- 
tion carriers [23], which matters, as we sometimes send contradictory signals 
(e.g. when hurt, we say “it’s fine”). In Mehrabian’s study, only 7% of the actual 
message is attributed to words, 38% is vocal and 55% is the body language [13]. 
Moreover, communication between diverse stakeholders who experience differ- 
ent conditions, such as being under stress in a quick response scenario, is also 
challenging. While there exist communication protocols and best practices, a 
shared mental model based on prior training and relevant experience in the per- 
sonal, interpersonal, and institutional dimensions is necessary to foster effective 
communication and collaboration, facilitating relationship-building and shared 
understanding [8]. 


1.2 ICE Conditions in Human Space Flight 


The subject of human exploration of outer space raises an increased interest 
in the safety of astronauts staying in the hostile environment of a space mis- 
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sion. Unique factors associated with it, such as microgravity, altered day-night 
cycle, sense of danger, isolation and confinement, affect human behavior, perfor- 
mance and well-being, but their long-term effects are still not well understood. 
ICE (Isolated, Confined and Extreme) environments are therefore used for their 
exploration, combining presence of various type of stressors to uncover com- 
pound effects they might have on the human being [24]. Experience shows that 
the impact of ICE conditions is often underestimated by both the crew and the 
support staff [3]. The planned return of man to the Moon, and in the longer 
term, plans to place a man on the surface of Mars, prompt scientists to look 
for ways to obtain the data, knowledge and experience necessary to prepare the 
crews for the mission. Building analog habitats allows to simulate selected mis- 
sion conditions in an artificially created environment. Depending on the project, 
the emphasis is on study of life support, food self-sufficiency or isolation issues 
[19]. 


1.3 VR for Collaboration and Work 


The use of Virtual Reality (VR) as an environment to meet together, whether 
for entertainment such as computer games or for professional purposes such as 
working on a joint project is called “Social VR” [14]. Trials showed that, if the 
process is properly prepared, e.g. the proper division of roles and competences is 
ensured so that each team member knows what tasks are ahead and how to use 
the tools provided, the use of VR is well received and positively influences the 
sense of community in the team [9]. A virtual multi-user experience (VMUE) can 
also be used as relatively low cost means of simulating the concurrent activities 
of crews interacting with a simulated work environment. VR applications include 
training, as well as research on team behavior and interaction during simulated 
task loads. It can be used to simulate unfamiliar environments, low-gravity con- 
ditions or complex equipment maintenance, enabling true collaboration, not just 
dry procedural training [12,20]. ESA is exploring the potential of VR and Aug- 
mented Reality (AR) also to increase the safety of teams working during missions 
- the EdcAR project could reduce the number of mistakes during procedures, 
such as the need for medical assistance when an on-board medic needs help [7]. 
The recent COVID-19 pandemic gave also an opportunity to explore mental 
health issues exposed by isolation [5] and the potential use of VR. as a tool to 
reduce anxiety, depression, perceived stress, and hopelessness while increasing 
social connectedness [17]. 


2 Method 


2.1 Simulated Lunar Mission and Its Crew 


The research team engaged in a simulated Alpha-XR Mission conducted by XR 
Lab PJAIT between the 27th of April 2022 and 13th of May 2022 at an ana- 
log space habitat. It was initiated by a 3-day training, followed by two weeks 
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of confinement in ICE (isolated, confined and extreme) conditions. The partic- 
ipatory research conducted during the mission related to individual and crew 
well-being and team collaboration. It was facilitated and studied with the use of 
virtual reality tools, questionnaires and ethnographic methods. There were five 
study participants, who, as crew members consented to the research conducted 
as part of the Alpha-XR mission, and other research ongoing in the analog space 
habitat. All of the crew members were quite comfortable with information and 
communication technologies. The two commanders who took part in this partic- 
ular study were very familiar with VR, having worked with it in a professional 
setting before and owning an Oculus headset. 


2.2 Briefings with Mission Control 


Every day of the mission, from the flight, lunar landing and the following EVA 
missions to return to Earth started with a remote briefing with the Mission 
Control team on Earth. The briefings would start at 9:00 AM and last about an 
hour, depending on how many things had to be discussed and who took charge 
of the meeting (either mission control members or the two commanders). Mis- 
sion Control briefings started with the commanders reporting on the status of 
the habitat, including the water and energy consumption, graywater levels, food 
supply. Information about the crew was also included, such as calorie intake, 
health concerns, any conflicts or changes noticed as well as morale and perfor- 
mance evaluations. Next, the status of the tasks planned for the previous day 
was discussed, as well as the tasks for the day. Most tasks were related to the 
planned EVAs', habitat upgrades and related engineering tasks as well as the 
XR research that was being conducted as the main aim of the mission (Fig. 1). 


execution (achieved / 


name of the objective partially / issues / comment 

neglected) 
Saliva Collection achieved 
Outreach Session achieved 
Participatory prototyping session: VR, AR, XR achieved 
BORP neglected no longer required 

3 some minor visual improvements may be 

SWAMP achieved done before EVA on T+8 
Tablet holder achieved 3 straps, 3 tablet holders, 6 adapter plates 
MC - Briefing Questionnaire 1 achieved 
MC - Check the dust sensor with IT team partially to be worked on by excursion crew during 
hydroponic setup test, observations gathered partially 
aeroponic setup test, observations gathered delayed no roots, not enough pre-shower water 
Grafana review neglected ran out of time 


Fig. 1. Overview of task statuses from Day 6, as discussed on Day 7 of the mission 


1 EVA, that is Extravehicular Activities, were planned for every second day, and they 
constituted a major part of the simulation, with the exploration of the lunar surface, 
building up the SWAMP temporary moon base environment (adding sensors and the 
Internet connection) or testing different analog astronaut suits, such as the BORP 
suit as well as documenting the lunar surface. 
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Audio Briefings. Audio briefings were held on a dedicated channel on the 
habitat’s Discord server, and the Commanders participated in them using either 
their personal laptops, or the personal mission tablets given to them at the 
start of the mission as well as personal headphones. During the audio briefings 
all of the participants looked at a shared Google Spreadsheets file, which had 
a sheet dedicated to the activities and habitat and crew statuses of each day. 
Additionally, there was an EVA file, with all the objectives and tasks related to 
the next EVA (Fig. 2). 


Fig. 2. Example activities reported during daily briefings. On the left engineering of 
the BORP suit collar, on the right an EVA mission in progress viewed from a lunar 
rover camera by HabCom, that is the two crewmates staying behind as support. 


VR Briefings. Two commanders and one member of MC participated in the 
VR briefings using Oculus 2 VR standalone headsets, on the first day in the same 
room, and the following days in separate rooms. Additionally, one member of 
MC connected to these briefings using a desktop application without using their 
webcam. The VR briefings were held using the Meta Horizon Workrooms and 
after the first meeting the whiteboard functionality was utilized to discuss tasks 
and their progress: an image of a relevant snippet of the Google Spreadsheets 
file was captured and uploaded to the Horizon Workrooms and pulled up to the 
whiteboard. 


2.3 Briefing Evaluation Tools 


All data was gathered using a timed survey built with Movisens: at 10:00 AM, 
after each briefing the commanders had an alarm on their tablets, which took 
them to the installed Movisens app where they jumped right into filling out a 
brief questionnaire. If this alarm was missed, then they received another prompt 
at a later time. Most questionnaires were submitted a few minutes after 10:00. 
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The questionnaires consisted of a query on the mode of participating in the meet- 
ing (Audio or VR), technical difficulties, discomfort and three longer questions 


asking to rate the experience on sliding visual analog bipolar scales, expressed 
underneath by ratings from 0 to 100. 


— Meeting satisfaction was measured with a scale developed by Rogelberg et 
al. [18] after some adjustments. We have adjusted the labels of the scales 
to instead of “stimulating/boring” read “engaging/non engaging”, instead 
of “enjoyable/unpleasant” to “pleasant /unpleasant” and instead of “satisfy- 
ing/annoying” to “meets expectations/doesn’t meet expectations”. 

— Social presence questions are inspired by Nowak and Biocca [16] and Bailen- 
son, Blascovich, Beall and Loomis [2] (but see also Swidrak, Pochwatko and 
Matejuk [21]); social interaction by Nezlek, Imbrie and Shean [15]. 

— The measures related to the mental state recorded after each meeting were 
based on Kashdan et al. [11] and Ciarocco, Twenge, Muraven, and Tice [6]. 


Additionally, meeting notes and impressions after the briefings were recorded 
in a journal by the executive commander and the crew kept written journals of 
their experience, as part of an autoethnographic project of another crewmate 
(Fig. 3). 


Select the format of the meeting you How do you rate this meeting? During the meeting During the meeting 
have just had: 
Oaudio pleasant unpleasant 1 felt I got involved in the meeting 1 felt that we operated as a real team 
= o not at all completely not at all completely 
Over e © 


Did you have any physical discomfort 


engaging 


non engaging 


during the meeting due to used © 1 felt close to other participants I had the impression that we were in the 
technology? not at all completely same space with other participants 
None A lot - not at all completely 
© intensive unintensive @ © 
© I was able to assess the reactions of the 
Did you have any technical difficulties other participants 1 found it difficult to focus completely 
during the meeting due to used fast slow pot atall completely on the meeting 
technology? not at all completely 
None Alot © e © 
© ee. doesn't meet 1 felt that the others were attentive to 
expectations expectations my needs and feelings 
© not at all completely 


Fig. 3. Movisens application, showing the first four screens (out of six) of the post- 
briefing questionnaire employing visual analog bipolar scales. The visible questions are 
derived from the Meeting Satisfaction scale [18] 


3 Results and Discussion 


The quantitative data consists of ratings from two participants of 3 audio meet- 
ings (6-8.05.2022), followed by three VR meetings (9-12.05.2022). The data for 
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5.05 and 10.05 was excluded as it was incomplete. The data was gathered after 
a week of regular audio meetings, so that participants could get used to holding 
daily briefings in their default mode before introducing first the questionnaire, 
after each audio briefing, to establish a baseline, and then the VR meeting mode. 


3.1 Technology and Overall Rating 


Both audio and VR were rated as comfortable and free of difficulties with VR 
causing slightly more physical discomfort. Journal notes on the first VR meeting 
read: There were two short freezes, but other than that it was enough to conduct 
the meeting. (...) We need to do it in two different rooms, because it is confusing 
to hear each other twice. The overall rating of the experience in answer to the 
question “How do you rate this meeting?” is strongly in favour of VR, as it was 
rated as nearly double as pleasant, engaging, intensive and fast when compared 
with audio meetings (see Fig. 4). 


© Audio (P1) E VR(P1) © Audio (P2) m VR (P2) 
100 96.7 94.7 


91.3 


50 


0 1 
Pleasant Engaging Intensive Fast Meets expectations 


Fig. 4. Rating averages of the audio and VR meetings 


Overall, the briefings held in VR were rated at 86/100 vs 55.9/100 for Audio, 
on whether they have met expectations. 


3.2 Impressions During the Meeting 


The impressions of the audio meetings recorded in the journal were that the 
meetings felt slow and not engaging, as if the Mission Control Staff was experi- 
encing time slower, it felt as if they even talked slower. This is recorded as the 
cause that the commanders were likely to lose focus during audio briefings and 
drift off, or start doing other tasks meanwhile. The impressions of the perception 
of time by MC and their subjective reported effect on the commanders may in 
part be attributed to the issues studied by Kansas et al., who found that with 
increased quantity of communication, there is an increasing chance of astronauts 
displacing the problems they experience onto ground control and experiencing 


294 K. Skorupska et al. 


more negative effects of ICE [10]. Meanwhile, journal notes from the first VR 
meeting read: It was awesome! Much better than the audio meetings we've had. 
The pacing was better (it was same time but more filled with information) and 
it was nice to share the same space, because we could use gestures, especially to 
explain some engineering that we’re doing (picture + gestures for VKK helmet) 
and gestures for the BORP on how it has broken at the collar attachments and 
the lights and the gloves. We also weren’t as frustrated and shared more, as we felt 
we can be understood and gauge the others’ impressions of what we are saying. 
The commander said he never wants to have audio briefings again:) Especially 
the use of gestures, enabled by VR communication, seems to be highly benefi- 
cial for communication quality while explaining engineering issues or expressing 
emotions. Similar impressions are visible in survey results in Fig.5, as the par- 
ticipants felt as if they participated in face-to-face meetings, sharing the same 
space with others, who were attentive to their needs. 
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Fig. 5. Average impressions of the Audio and VR Meetings from P1 and P2 


3.3 Mental State After Each Meeting 


Mental states were mostly more positive after the use of VR, as shown in Fig.6. 
The only exception here was being “sad” and “anxious” for P2, however that 
may be attributed either to the slowly approaching ending of the mission, which 
also coincided with VR meetings or other, unrelated, circumstances. However, 
on other occasions the whole crew mentioned that as the mission was drawing to 
an end they felt progressively sadder. Overall, after having used VR for briefings 
the participants were a bit more content, enthusiastic, joyful, and relaxed, while 
after audio briefings they were slightly more mentally exhausted. An interesting 
research direction is the effect of VR on mood and feelings about one's cognitive 
performance. In this study, the crew mentioned increasing problems as more 
time in confinement passed, as indicted by this journal entry: We have cognitive 
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decline that we don’t recognize, because it sneaks up on us. Astronauts have it 
too - why does that happen though? (...) This is actually a bad problem because 
it frustrates people. We have trouble learning new stuff. Forget e.g. chords, have 
trouble learning and explaining new games, have trouble remembering where stuff 
is either physically or on the drive. 
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Fig. 6. Average mental states of P1 and P2 after each type of the meetings 


4 Conclusions and Future Work 


This pilot study points to possible positive effects of enriching audio-only com- 
munication with VR, especially in stressful settings, requiring frequent reports 
and demanding high understanding between participants. VR facilitates com- 
munication, especially when explaining complex problems and experiences and 
it provides the sense of connection and empathy, that may be lacking when it 
comes to audio calls. VR-meetings forced participants to fully focus on them 
and to be in the moment, despite their perceived cognitive decline and the stress 
of meeting daily goals. This resulted in feeling of being in a shared space, not 
only physically, but also in terms of common understanding and emotional con- 
nection. Future work ought to explore the effect of substituting other modes of 
communication, either audio or video, with VR in longer, larger studies. 
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Abstract. The purpose of this study was to build and evaluate a laboratory stand 
for testing MV switchgears using VR technology in the power industry. In the 
course of work, an exemplary program of operation of the application created by 
PRADMA was developed. The developed program was designed to effectively 
introduce the application user to the basic information about moving the virtual 
avatar, interacting with the prepared elements, and to provide information on the 
construction of the MV switchgear and its components. In order to confirm the 
advantages of VR technology and the quality of the developed VR application 
program, tests of the VR application were carried out with the help of students of 
the Warsaw University of Technology. The results of the evaluation questionnaire 
created for the purposes of the tests were used to develop conclusions regarding the 
use of VR in the power industry. The evaluation questionnaire questions related to 
key issues such as the quality of the mapped virtual elements, the level of realism 
preserved in the virtual environment, the effectiveness of the developed program 
or the substantive values of the entire exercise. The summary contains the answer 
to the theses put forward in the paper regarding the profitability and usefulness of 
using VR applications in the power industry. 


Keywords: evaluation - switchgear - power engineering - virtual reality - 
application - user - avatar - interaction 


1 Introduction 


At the turn of the last twenty years, significant progress has been made in the field 
of computer sciences and information technology. New solutions and applications of 
modern technologies have been intensively searched for in both the science and enter- 
tainment sectors in order to facilitate and optimize everyday life. This translated directly 
into the development of technology which is virtual reality, which with each successive 
year manifests its application in subsequent areas of life. 

Virtual reality (VR) is used in those industries where making a mistake may result in 
damage to health or exposure to costs related to the destruction of equipment. Due to the 
affordability and wide range of possibilities of this technology, it is gaining more and 
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more popularity all over the world. Among others, Schneider Electrics used virtual reality 
as a training medium in the field of electrical power devices offered by the company 
[1]. Another example of the use of virtual reality in practice is a project implemented by 
Enea Operator, consisting in mapping a training means for learning work under voltage. 
This project involves the development of training scenarios for selected Main Power 
Supply Points and MV stations [2]. 

Computer-generated artificial reality is not limited to a computer that generates 
virtual space. Necessary to use VR technology are devices that are the medium between 
humans and the virtual world. The combination of such devices enabling real-time 
interaction of the real world and the computer-created space, creating an extension of 
the real world, is called AR (Augmented Reality) [3, 4]. This interaction applies to both 
adding virtual objects in the real world and placing real objects in the virtual space, 
creating a mixture of both these environments. 

The main purpose of this work was to create and evaluate a laboratory stand for 
testing MV switchgear with the use of VR technology. As part of the work, an exemplary 
program of operation of a VR application developed by PRADMA and Institute of 
Power Engineering in Warsaw University of Technology (WUT) was developed in order 
to create a comfortable and effective environment for teaching purposes in the field of 
power engineering, with the specification of issues related to medium voltage switchgear. 
Then, on the basis of the developed scheme, application tests were performed and an 
evaluation questionnaire was conducted to develop the results of the tests. 

During the work, the profitability of the VR application, the impact of its use on the 
learning outcomes and other aspects of the use of virtual reality, such as its impact on 
the well-being of users or the level of complexity of operating devices in virtual space, 
were analyzed. 


2 The MV Switchgear 


The main aspect of the VR application that was used for the work is the virtual model of 
the MV switchgear. A switchgear is a set of electrical power equipment operating at the 
same rated voltage, used to distribute electricity. It consists of a structure equipped with 
busbars and insulating elements, as well as electric power equipment serving as distri- 
bution, protection or measurement [5]. The medium voltage switchgear implemented in 
the application is based on the real model of the switchgear produced by Elektrometal 
Energetyka SA presented in Fig. 1. 

MV switchgears with power equipment are characterized by characteristic electrical 
quantities, the values of which depend on the method of execution, electrical solutions 
and materials used. These values are crucial when selecting the equipment, and the 
designers make every effort to achieve the highest possible values of current, voltage or 
temperature to which the switchgear will be adapted, while keeping its dimensions and 
production costs as small as possible. The standardization of the e2ALPHA switchgear 
in terms of limit values and design applications is described in standards, i.e. [6]. Due 
to the nature of the work, the focus was mainly on the design properties of the discussed 
switchgear. 

MV switchgear as an electrical power device should be as reliable and safe as possible 
for its users. For the benefit of its users, switchgears are equipped with a number of 


300 T. Daszczyński et al. 


interlocks related to the user’s inaccessibility to conductive elements during its operation 
under voltage. One of the most important interlocks in the e2ALPHA switchgear are 
interlocks for the withdrawable unit compartment door while the withdrawable unit is 
in the operating state, the interlock of closing the earthing switch during the operating 
state of the movable unit or the interlocking of opening the cable compartment while 
the earthing switch is in the open position. 


Fig. 1. MV switchear from Elektrometal Energetyka SA (picture taken in WUT). 


3 Development of a VR Application Operation Program 


The aim of the program development was to adapt the user of the application to the 
virtual reality environment, to familiarize with the rules of moving inside the virtual 
space and to perform two pre-prepared didactic exercises in the field of testing the MV 
switchgear and its components. 


3.1 Description of Equipment and Installations 


For the proper functioning of the VR application, a set of necessary devices was used, 
which included: 


— VR glasses, two touch controllers included in the Oculus Quest 2 set [7], 
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— A computer equipped with a modern processor and graphics card, with sufficient 
computing power, 

— Monitor displaying a view from the perspective of using VR glasses, used to analyze 
the respondent’s movements in real time, 

— USB type cable - USB Type C, connecting the glasses with a computer unit in order 
to transmit the image to the monitor and increase the computing power of the glasses. 


The created laboratory stand is shown in Fig. 2. In the further description of the 
application, for easier identification, the user is the person who is currently using VR 
glasses, and the avatar is the character that the user moves in the virtual world. 


eer 


Fig. 2. Laboratory setup in the Laboratory of Power Apparatus and Switching Process (picture 
taken in WUT). 


3.2 Description of the Application Interface Using the Tutorial Example 


After launching the VR application, the user will see a menu on the glasses screen 
containing the selection of a specific part of the exercise. The virtual application menu 
is presented on the Fig. 3. The selection is made using the main button, targeting the 
selected option with the indicator generated by the controller. During the first contact 
with the program, the target option will be to select the tutorial first. 

After selecting the tutorial, we will be greeted with a visual message along with 
the tutor reading its content, in which we will be informed about the purpose of the 
tutorial. After accepting the message with the main button, you will find yourself in a 
virtual room. The teacher will then inform the user about the rules of navigating in the 
virtual world. The avatar is rotated by physically turning the user’s head or by using 
the knob on the top panel of the controller. Moving around the virtual room takes place 
through physical movement of the user or teleportation with an avatar. After pointing 
the controller pointer at any place on the virtual room floor, the user will see a circle 
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appearing in the selected place. After pressing the circle, the avatar will be teleported 
to the indicated place. Such an application is intended to facilitate the user's movement 
and to limit the user's movement within the security zone. 

Then, the user will be informed about the possibility of interacting with objects using 
the main controller button and about the possibility of catching virtual items. Using the 
middle finger button, the user can grab the switchgear element and rotate it freely by 
moving the controller. After executing the instructions, the user will be asked to check 
the acquired skills by pressing the green button on the wall and dragging the switch in 
the trolley into the vicinity of the green zone. After completing the tutorial, the user will 
be informed about the possibility of leaving it. 


WYBIERZ ĆWICZENIE 


Samouczek 


Ćwiczenie | 
Budowa rozdzielnicy 


Ćwiczenie 2 
Diagnoza rozdzielnicy 


Wyjście 


Fig. 3. Basic menu of the VR application. 


3.3 VR Exercise 1 


After completing the tutorial, the user from the application menu should enter the main 
part of the first exercise, entitled “Switchgear construction”. In this part of the exercise, 
the user was presented with the construction of MV switchgear on the example of 
the e2ALPHA switchgear described in chapter two. The user of the application will 
get to know the basic structural elements of the switchgear, view its devices, read the 
information prepared about them and view construction diagrams displayed on the virtual 
board. 

After entering the exercise part concerning the construction of the switchgear, the 
user will be transported again to the virtual room and will be informed about the purpose 
of the exercise. Then the user will be instructed to approach the switchgear located 
in the room and to click it with the main button. After these steps are completed, a 
short animation will be shown showing the elements separating from the switchgear and 
moving towards the center of the room. 

The element to which the controller will be approached will be highlighted in yellow, 
which mean that one can interact. After pressing the button, the avatar will be transported 
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to the room where the user can freely move the isolated element, read the information on 
the board and view a photo or diagram of the actual device. Example of isolated element 
of MV switchgear (circuit breaker) is shown in Fig. 4. 

After examining a given element, the return button on the wall can be pressed, which 
results in returning to the training field and marking the examined element on the list of 
elements on the wall, presented in Fig. 5. 


Fig. 4. Separated elements of the virtual switchgear (left) and the virtual model of the e2BRAVO 
circuit breaker (right). 


Zapoznaj sig z 


Fig. 5. The list of possible to view elements in MV switchgear. 


Additionally, on one of the walls of the virtual room there is a table with diagrams 
of devices and elements used in the exercise. After pressing an arrow associated with a 
given element, its detailed diagram will be displayed. 
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3.4 VR Exercise 2 


In the second exercise, entitled “Switchgear diagnostics”, the user was supposed to use 
a trolley to pull out the MV circuit breaker located in the withdrawable compartment 
of the switchgear. Then he was subjected to the task of finding five damages to the MV 
switchgear and its elements caused by the action of an electric arc. In order to open 
the withdrawable compartment of the switchgear, it was first necessary to release the 
lock preventing the opening of the compartment door in the absence of safe conditions. 
This is done by then opening the switch contacts by pressing the red button responsible 
for controlling the switch and closing the earthing switch via the green control button 
of the earthing switch. If any wrong steps are taken, the user will be notified that they 
cannot be performed. On Fig. 6 the MV circuit breaker used in the exercise is shown. The 
maneuvering of devices and electrical apparatus in the model is carried out in accordance 
with the principles of operating works at MV switchgears. 


Fig. 6. The MV circuit breaker in VR application 


4 Testing of VR Application 


For the purpose of testing VR applications, a group of forty-three students of electrical 
and related faculties was assembled. Before performing the VR application test, each 
student was acquainted with the construction of a real MV switchgear located at the test 
site, shown in Fig. 1. In order to verify the correctness of the switchgear implementation 
into the virtual environment. 

Then, each student started the application program discussed earlier, and then con- 
veyed his feelings from the virtual exercise by performing a prepared evaluation ques- 
tionnaire. The basis of the survey questions were questions developed by the author of 
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this work and, inter alia, in close cooperation with Dr. M. Marzec, Dr. M. Kotyśko and 
Dr. E. Waszkiewicz, employees of the Faculty of Social Sciences of the Department 
of Clinical Development and Education at the University of Warmia and Mazury. The 
questions were arranged for the purposes of the study in order to develop conclusions 
from the use of VR applications. The survey was made with the use of the Microsoft 
Forms survey editor and consisted of 45 questions concerning the feelings associated 
with the use of VR, the accessibility of this technology, the developed exercises and the 
level of mapping virtual elements. The results of the survey were then used to develop 
the results of the tests and the quality of the prepared application scheme. 


5 Summary and Conclusions 


The proposed VR application operation program is a proven example for substantive 
classes in the field of power engineering, detailing issues related to the MV switchgear. 
It allowed for quick and effective adaptation of the user to the virtual world, while 
leaving him as much knowledge gained during the prepared exercises as possible. The 
test participants showed a high level of focus and interest during the execution of the 
developed VR application program. 

The laboratory stand created for the purpose of testing the application allowed for 
the effective conduct of the research participants through the developed diagram of the 
VR application. The process of creating a laboratory stand allowed to learn about the 
affordability and ease of installation and operation of equipment adapted to virtual reality 
operations. 

The prepared evaluation questionnaire allowed to collect the users’ feelings about the 
prepared exercises and the overall use of VR. The test participants’ answers confirmed 
the belief that VR technology is attractive in the academic environment. According to the 
respondents, the exercises prepared in VR made it possible to effectively remember the 
presented content about the switchgear and its elements, which confirmed the sense of 
using VR as a way to supplement standard teaching techniques. The models implemented 
in the VR application were recreated realistically enough to ensure a positive reception of 
the application among the respondents. The entire VR exercise was received positively 
by the test participants and they firmly supported the attractiveness of the presented 
exercises and the opportunities created by virtual reality. 

Current attempts to use VR technology indicate the trend of the increasingly common 
use of VR technology in subsequent sectors of life. In the future, the discussed VR 
application will be developed in a project from Polish Ministry of Education and Science 
“Development and implementation of a cognitive training platform using VR and AR 
technologies”. 


“Publication co-financed from the state budget under the program of the Minister 
of Education and Science under the name” Science for Society "project number 
NdS/532664/2021/2022 amount of funding 13 500 zł total value of the project 
1 355 390 zł”. 
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Abstract. The following paper introduces a new way of presenting the 
results of engineering simulations. The object of consideration is the 
motion of the snake robot on a flat surface. The robot’s trajectory and 
control signals are calculated in MATLAB. Different approaches have 
been presented to show how the robot moves - from 2D plots and 3D 
animations observed from a computer screen to realistic visualisations 
displayed in the Virtual Reality headset. The proposed VR simulation 
will allow watching the simulation results and manipulating simulation 
parameters from inside of VR. 


Keywords: Virtual Reality - Snake robot - Simulations of multi-body 
systems 


1 Introduction 


Biomimetic robots are a class of robots that resemble a living organism’s shape, 
appearance, or behaviour. They are designed to use biological principles in engi- 
neered systems to behave like a natural being, allowing them to solve specific 
problems, not possible for standard machines [1]. A snake robot is an example 
of a biomimetic, hyper-redundant robot with many degrees of freedom. Changes 
in the internal shape cause the snake robot’s motion, similar to the natural bio- 
logical creatures. Each robot configuration is characterized as a series of angles 
in joints connecting a series of robot segments [2]. 

Virtual Reality (VR) provides a platform to visualize the immersive 
behaviour of objects in a three-dimensional environment. Nowadays, VR. is 
widely used in entertainment to enhance immersive movements. However, VR is 
also popular among researchers to construct complex environments in simulation 
and observe the behaviour of objects. 

A comprehensive review of virtual reality interfaces for controlling and inter- 
acting with robotics is presented in [3]. The authors describe that VR interfaces 
are not only used for visualization but also for robotic interaction, planning, 
usability and infrastructure for both expert and non-expert users. However, this 
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review couldn't include biomimetic robotics. In [4] motion of haptic snakes- 
serpentine shapes is investigated for multiple feedback, i.e. tapping and gesture 
feedback in virtual reality. However, literature related to snake robots in virtual 
reality is limited. 

The main aim of this paper is to present a VR environment prototype to 
visualise the snake robot’s motion. The proposed system can be used to evaluate 
different robot control algorithms. 

This article is organized as follows. The Introduction section describes 
the importance of virtual reality simulations in snake robot design. Section 2 
describes the snake model used in this paper. Section 3 presented a snake robot 
simulation in MATLAB. Section 4 presents results and simulations in MATLAB 
and Unity software. Finally, the conclusions and future works are given 


2 Snake Robot Model 


The model used in the article is based on widely used equations of snake robot 
motion on a flat horizontal surface. The dynamical model of the snake robot can 
be derived based on the torque equilibrium equation. A detailed description of 
the model can be found in [5] and [6]: 


M,68 + wea = ISPKfR.x + ICoKfpy = Du (1) 


where Sp = diag(sin(6)) € RYN and C = diag(cos(0)) e RY*% are square 
matrices with trigonometric functions of link angles at the diagonal and zeros 
in the remaining elements, and link angles 0 = [6;,... On| € RV in global 
coordinate system. The vector u e RV"! defines the controllable parameters - 
actuator torques exerted on successive links. The fr, z and fr, vectors represent 
components of friction force on the links in global x and y direction. 

The movement of the snake robot is possible due to anisotropic friction force. 
The friction coefficient in the longitudinal direction of each joint is much lower 
than the coefficient in the lateral direction. This property allows the robot joints 
to slide in the forward direction. 


3 MATLAB Simulations 


The robot’s equations of motion have been implemented in MATLAB software 
and solved using the ode solver. MATLAB implementation also included control 
algorithms that allowed the snake robot head to reach a designated position. It 
is also possible to track the position of the robot’s centre, but it is less useful in 
trajectory tracking problems. 

A simple path-following method by the snake robot is called a Line-of-Sight 
(LoS) method [7]. According to this method, the robot is tracking a straight line. 
This approach requires the definition of the global coordinate system {x,y} in 
which the x axis is aligned along with the forward movement. The implementa- 
tion of LoS algorithm is available at [11,12]. Full description of the program and 
implemented snake robot model can be found in [9]. 
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The MATLAB Model Predictive Control Toolbox allowed the implemen- 
tation and testing MPC algorithm in trajectory tracking. Two MPC varia- 
tions have been tested - when all joints of the robot could move independently 
and when all joints were coupled in sinusoidal function. In the first case, the 
MPC algorithm calculated nine independent variables for ten segment robot. In 
the second case, there where only three variables - parameters of the function 
a1(sin(a2 + ag :i)) where i is the segment’s number. Figure 1 shows the compari- 
son of the resultant trajectory of the robot head for different control algorithms. 
During simulation the robot consisting 10 segments of length 0.2m the robot 
had to achieve the following points one by one: (2.5m, 0m), (3m, —1m), (4m, 
0m), (6m, 1m), (8m, 1m). 


snake robot trajectory 
ZDARZA 


—MPC - 3 signals 
—MPC - 9 signals. 


Fig. 1. Plot showing trajectory of the snake robot’s head for different control algo- 
rithms. 


The resultant position and orientation of snake robot links at each moment 
have been saved to the txt file for each control strategy. Based on this informa- 
tion, visualisation programs can fully restore the snake robot’s motion 


4 Visualisation of the Snake Robot Motion 


4.1 Simulink 3D Animation 


Visualization of robot movement is implemented using a MATLAB Simulink 
3D Animation toolbox. The geometry of the robot segment is imported from 
the STL file created in SOLIDWORKS. The details of the segment design were 
described in [9]. The introduced framework allows simulating the robot’s motion 
for different geometry parameters (e.g., number of joints, weight, friction coeffi- 
cients), target trajectories (position of points to follow), and control parameters 
(e.g., head/centre tracking, controller gains, joints’ disturbances). The ready-to- 
use code with comments, helping to run the software, is available in [11] and 
[12]. The main LiveScript file start.mlx contains a detailed description of the 
model and its implementation. The Simulink file VRmodel2.s1x enables running 
3D visualization of the snake motion shown in Fig. 2. 
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Fig. 2. Visualisation of the snake robot implemented in Simulink 3D Animation. 


4.2 3D Simulations in Unity 


The following work step was the implementation of the 3D visualisation of the 
snake robot using software dedicated to graphical animations. We used Unity 
Engine and Oculus Interaction SDKK [8]. The implemented program allows 
simulation in Oculus Quest 2 headset [10]. The user can interrupt the program 
execution using VR controllers. The visualisation program used the same CAD 
segments as the animation described in Sect.4.1. The position and orientation 
of the robot segments in consecutive moments were interpolated using data read 
from a txt file generated by MATLAB. The virtual environment also displays 
the location of the reference point the snake is currently following (Fig. 3). 


ę A 
A 


Fig. 3. Visualisation of the snake robot implemented in Unity. 


The main advantage of VR visualisation is that the user can observe the 
snake robot’s motion from any distance and perspective. The proposed solution 
allows users to catch any distortion of robot motion and get an idea of how 
would the real robot behave. 
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Aside from the segments position program can also visualise the velocity of 
each segment. For this purpose, an animation shows a point corresponding with 
the future segment’s position. The current and future coordinates line describes 
the robot’s velocity value and direction. Users can change the selected link at 
any time during the simulation (Fig. 4). 


Rubish Mechanics 
awa: 


a 
p—s 
Pe 


Fig. 4. Graphical User Interface and controllers ray allowing interaction with GUI. 


By manipulating the Oculus Quest 2 controllers, the user can change the 
speed of the simulation, stop it or rewind it. There is also a Graphical User 
Interface (GUI) shown in VR, which allows manipulating the simulation time, 
restarting it or running different simulations by changing the file with the posi- 
tions of robot segments. The GUI provides for changing the selected link for 
velocity visualisation and includes sliders, push, toggle, and a drop-down list. 
The user interacts with GUI elements by “ray” going out from the controller. 


5 Conclusions 


Virtual Reality is a very innovative and quickly developing technology. It offers 
the possibility of not only observing but also experiencing the virtual environ- 
ment. Above entertainment, education and socialization, it brings unique oppor- 
tunities in engineering research. The article shows a novel way of using VR for 
visualization of the motion of the snake robot. This approach can significantly 
affect the development of control algorithms for robotic systems. 

Future work should concentrate on observing the snake robot’s motion and 
interacting with it from VR. Visualisation in Unity and simulation in MATLAB 
will run parallel, and there will be established communication between both 
programs. The program will allow changing of the robot’s trajectory from the 
interior of VR, so the observer wearing a VR headset could analyze the robot’s 
motion and interact with it in real-time. The destination point that the robot 
should reach would be assigned by a user using the VR controller. The user will 
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mark the desired position of the robot on the floor and click the trigger button 
to record it. Unity would send the coordinates of the chosen point to MATLAB. 
Unit-MATLAB communication will be based on TCP protocols. The control 
algorithm implemented in MATLAB would calculate the target trajectory based 
on the received reference point. The computed positions of segments will be sent 
back to Unity, and the output robot’s configuration will be displayed in VR. The 
new versions of the program GUI will include the algorithm selection and allow 
for parameter changes. 
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Abstract. Stroke is the second cause of mortality and one of the leading causes 
of disability in adults. Post-stroke complications involve many different systems 
through which they involve difficulties in daily life. A very common complication 
that involves about 25—30% of post-stroke patients is spatial neglect syndrome, 
which involves the impaired perception of one's body and space. An important 
aspect of treatment for stroke patients is rehabilitation, both while still in the 
hospital and later in rehabilitation facilities as well as at home. Many studies have 
shown effective virtual reality (VR)-based general therapy systems after stroke. In 
particular, systems for motor function rehabilitation. In the following paper, a game 
proposal for the rehabilitation of patients with unilateral spatial neglect syndrome 
is shown. This game takes into account the specific perception and special motor 
skills of patients with spatial neglect syndrome. The described game was presented 
to a team of rehabilitation specialists working at the Department of Neurology and 
Stroke Unit of the University Clinical Hospital in Białystok and was evaluated by 
these specialists. 


Keywords: Rehabilitation - Virtual reality in medicine - Stroke - Spatial neglect 
syndrome 


1 Introduction 


A stroke is a life-threatening emergency condition that results in acute symptoms of focal 
damage to the brain, spinal cord, or retina. It represents a very significant social problem. 
Every year 90 thousand people in Poland suffer a stroke, while in the world itis 17 million. 
Moreover, there is an increase in the number of cases in young and middle-aged people. 
An important side of this condition is also the social aspect. Stroke is the first cause of 
morbidity, long-term disability, and epilepsy in the elderly [1]. It is also the second cause 
of dementia and the second cause of death [2]. Complications following stroke include 
cardiac, pulmonary, gastrointestinal, musculoskeletal, neurological complications, and 
other not classified complications like fatigue, depression, or fever [3]. A complication 
that affects about 25-30% of patients is unilateral hemispheric neglect syndrome [4]. 
This is a disorder of spatial attention in which the perception of and response to stimuli 
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from the part of the body opposite to the location of the stroke focus are reduced. The 
consequence of this syndrome is increased length of hospitalization and increased cost 
of care [5]. This is due to the need for comprehensive rehabilitation [6]. 

Community-based, outpatient, and home-based rehabilitation enable improved func- 
tioning, acquisition of new self-care skills, and partial or complete recovery of indepen- 
dence [7]. The extent of rehabilitation should always be tailored to the needs of the 
patient. The goal of physiotherapy in a patient with a motor deficit, whether resulting 
from damage to the primary motor cortex or more complex, is to restore motor skills or 
compensate for them. Movement deficits can be addressed directly by movement ther- 
apy with active patient participation. The guidelines for stroke management mention 
rehabilitation with virtual reality, indicating that there are high hopes for this form of 
rehabilitation [6]. Studies show that using immersive VR for the general rehabilitation 
of stroke patients improves their balance, reduces the risk of falls, and improves the 
perception of visual verticality [8]. Other studies indicate that VR can reduce upper limb 
motor disabilities [9, 10] and can encourage physical activity and social participation 
[11, 12]. 


2 Methods 


2.1 Rehabilitation in Semi-neglect Syndrome 


Neurological physiotherapists indicate that during exercises for unilateral atrophy syn- 
drome, various stimuli should be involved: auditory and visual, which should direct the 
patient's attention to the neglected side. It is important that both limb exercises on the 
neglected side and visuospatial search training be adapted to the patient's altered or, 
if possible, corrected midline. The eye or limb movement should be progressively per- 
formed from the non-skipped side towards the skipped side (to a line or fixed point/object 
that is clearly visible to the patient) [13]. 

The first exercises should be based on grasping objects with the non-neglected side 
and transferring them, by crossing the midline, to the neglected side. Subsequent exer- 
cises can also be based on grasping objects with the neglected side and transferring them 
to the neglected side - the alternating handwork of crossing the midline is intended to 
make the brain aware of the neglected side again. 


2.2 VR-Based Application for After-Stroke Complications 


Based on the above physiotherapists’ guidelines, a VR game was developed (Fig. 1.). It 
was located in a quiet, open forest area, which allows the patient to relax while receiving 
physiotherapy treatments. The aim of the game is to collect apples placed on the trees 
and place them in the boxes appearing on the opposite side. It is important to note that 
red apples can only be picked with the right hand and green apples with the left hand (if 
you try to grab an apple with the opposite hand, the box will not appear). 
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Fig. 1. VR game for a patient with hemineglect, unilateral neglect syndrome 


Figure | shows that the patient’s task is to grab a green apple with his left hand and 
move it to the crate on the right. During this movement, there is a crossing of the midline 
and a visual search of the space on the neglected side. Additionally, correctly locating 
the box and hitting it with the apple is reinforced with a haptic stimulus in the form of 
a vibrating controller. At the same time, the patient may also attempt to grasp the red 
apples with the other hand and move them to the opposite side. The ability to grasp with 
both the active and inactive side also helps to train manual dexterity of the neglected 
side. 

During the game, the patient can check his progress by monitoring the statistics: the 
number of green apples collected, the number of red apples collected, the percentage of 
apples correctly thrown, and the average time from collecting any/red/green apples to 
throwing them into the box. 

The game created in this way was presented to a team of neurological rehabilitation 
specialists working at the Department of Neurology and Stroke Unit of the University 
Clinical Hospital in Biatystok, composed of Agnieszka Zieziula, M.Sc. in physiotherapy, 
Izabela Zalesko, M.Sc. in physiotherapy, and Justyna Karpińska, M.Sc. in physiotherapy 
(Fig. 2.). 
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Fig. 2. Game testing by a team of physiotherapists 


3 Results and Conclusion 


After testing, the game was commented on by a team of physiotherapists. In their opin- 
ion, it has potential in the rehabilitation of patients with upper limb paresis as well as 
a certain group of patients affected by unilateral spatial neglect syndrome. In the case 
of unilateral atrophy, the target group could be patients in the home rehabilitation stage. 
Among the advantages of the developed application, physiotherapists emphasize the cor- 
rect course of movement of the upper limb, as well as the frequent crossing of the center 
line. In addition to the physiotherapeutic elements, specialists also note the pleasant 
environment, which can contribute to prolonged physiotherapy time. This is consistent 
with many other studies that indicate the effectiveness of using VR games in rehabilitat- 
ing various neurological conditions, such as Parkinson’s disease [14], spinal cord injury 
[15], and phantom pain [16]. In addition to improvements in physical performance, many 
studies also indicate greater satisfaction with physiotherapy among patients and more 
willingness to exercise, which translates not only into improvements in physical fitness 
but also in mental health [17, 18]. In the case of hospital patients, a simpler version of 
the game would be necessary - exclusion of movement in the game, limitation of control 
to one hand, and higher color contrast. This is due to the greater limitations of these 
patients and the need for better adaptation to their requirements. 


3.1 Further Steps 


It is planned to develop the application so that it is adapted to the needs of patients 
in hospital neurological and stroke wards. As part of the customization of the app, it 
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is primarily planned to simplify the game. In the game under development, the patient 
should not have to move around the environment. In addition, the game should be able to 
be operated with a single controller. Significantly from the point of view of physiotherapy, 
the destination - the box, should appear in different locations. 


References 


1. Lopez, A.D., Mathers, C.D., Ezzati, M., Jamison, D.T., Murray, C.J.: Global and regional 
burden of disease and risk factors, 2001: systematic analysis of population health data. Lancet 
367, 1747-1757 (2006). https://doi.org/10.1016/S0140-6736(06)68770-9 

2. Kim, W.-S., et al.: Clinical application of virtual reality for upper limb motor rehabilitation 
in stroke: review of technologies and clinical evidence. J. Clin. Med. 9, 3369 (2020). https:// 
doi.org/10.3390/jcem9 103369 

3. Kumar, S., Selim, M.H., Caplan, L.R.: Medical complications after stroke. Lancet Neurol. 9, 
105-118 (2010). https://doi.org/10.1016/S 1474-4422(09)70266-2 

4. Gammeri, R., Iacono, C., Ricci, R., Salatino, A.: Unilateral spatial neglect after stroke: current 
insights. Neuropsychiatr. Dis. Treat. 16, 131-152 (2020). https://doi.org/10.2147/NDT.S17 
1461 

5. Kot-Bryćko, K., Pietraszkiewicz, F.: Psychologia w medycynie . Część 2 — rehabilitacja 
neuropsychologiczna po udarze mózgu, pp. 344—347 (2012) 

6. Błażejewska-Hyżorek, B., et al.: Wytyczne postępowania w udarze mózgu. Pol. Przegląd 
Neurol. 15, 1-156 (2019). https://doi.org/10.5603/PPN.2019.0001 

7. Legg, L., Langhorne, P.: Rehabilitation therapy services for stroke patients living at home: a 
systematic review of randomized trials. Lancet 363, 352—356 (2004). https://doi.org/10.1016/ 
S0140-6736(04)15434-2 

8. Cortés-Pérez, I., Nieto-Escamez, F.A., Obrero-Gaitan, E.: Immersive virtual reality in stroke 
patients as a new approach for reducing postural disabilities and falls risk: a case series. Brain 
Sci. 10, 296 (2020). https://doi.org/10.3390/brainsci10050296 

9. Fang, Z., et al.: Effect of traditional plus virtual reality rehabilitation on prognosis of stroke 
survivors. Am. J. Phys. Med. Rehabil. 101, 217—228 (2022). https://doi.org/10.1097/PHM. 
0000000000001775 

10. Rodriguez-Hernandez, M., Criado-Alvarez, J.-J., Corregidor-Sanchez, A.-I., Martin-Conty, 
J.L., Mohedano-Moriano, A., Polonio-López, B.: Effects of virtual reality-based therapy on 
quality of life of patients with subacute stroke: a three-month follow-up randomized controlled 
trial. Int. J. Environ. Res. Public Health 18, 2810 (2021). https://doi.org/10.3390/ijerph180 
62810 

11. Mekbib, D.B., et al.: Virtual reality therapy for upper limb rehabilitation in patients with 
stroke: a meta-analysis of randomized clinical trials. Brain Inj. 34, 456-465 (2020). https:// 
doi.org/10.1080/02699052.2020.1725126 

12. Maggio, M.G., et al.: Virtual reality and cognitive rehabilitation in people with stroke: an 
overview. J. Neurosci. Nurs. 51, 101-105 (2019). https://doi.org/10.1097/JNN.000000000 
0000423 

13. Konkel, M., Drozd, A., Nowacka-ktos, M., Hansdorfer-korzon, R., Barna, M.: Zespół pomija- 
nia stronnego u pacjentów po udarze mózgu — przegląd metod fizjoterapeutycznych. Forum 
Med. Rodz. 9, 405-415 (2015) 

14. Lei, C., et al.: Effects of virtual reality rehabilitation training on gait and balance in patients 
with Parkinson's disease: a systematic review. PLoS ONE 14, e0224819 (2019). https://doi. 
org/10.1371/journal.pone.0224819 


15. 


16. 


17. 


18. 


Prototype of Virtual Reality Game to Support Post-stroke Recovery 319 


de Aratijo, A.V.L., Neiva, J.F.D.O., Monteiro, C.B.D.M., Magalhies, F.H.: Efficacy of virtual 
reality rehabilitation after spinal cord injury: a systematic review. Biomed. Res. Int. 2019, 
1-15 (2019). https://doi.org/10.1155/2019/7106951 

Osumi, M., Inomata, K., Inoue, Y., Otake, Y., Morioka, S., Sumitani, M.: Characteristics of 
phantom limb pain alleviated with virtual reality rehabilitation. Pain Med. 20, 1038—1046 
(2019). https://doi.org/10.1093/pm/pny269 

Sveistrup, H.: Motor rehabilitation using virtual reality. J. Neuroeng. Rehabil. 1, 10 (2004). 
https://doi.org/10.1186/1743-0003-1-10 

Rose, T., Nam, C.S., Chen, K.B.: Immersion of virtual reality for rehabilitation - review. Appl. 
Ergon. 69, 153-161 (2018). https://doi.org/10.1016/j.apergo.2018.01.009 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 


The images or other third party material in this chapter are included in the chapter’s Creative 


Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Author Index 


A Jedrzejewski, Zbigniew 150 
Ajenaghughrure, Ighoyota Ben 127 Jóźwiak, Rafał 45, 76, 85, 103 
Asesh, Aishwarya 26 
K 
B Karpowicz, Barbara 176 
Bäcker, Niklas 139 Kjellberg, Magnus 93 
Balki, Fatih 111 Klimaszewski, Jan 242, 250, 270 
Berdek, Kacper 298 Kobyliński, Paweł 202 
Białorucki, Przemysław 242 Kopeć, Wiesław 150, 176, 287 
Borkiewicz, Paulina 202 Kornacka, Monika 176, 287 
Brómmling, Lukas 139 Kozłowski, Marek 35 
Brzezinski, Piotr 35 Kupiński, Kamil 213 
Buchem, Ilona 139 
Buczek, Aleksandra 213 L 
Buczkowski, Przemyslaw 35 Lapin, Kristina 191 
Leischner, Vojtéch 3 
c Lorenc, Tomasz 76, 103 
C. Firmino De Souza, Debora 127 
Cam, Veli 111 M 
Carrasco Limeros, Sandra 93 Majchrowska, Sylwia 93 
Chakraborty, Somenath 67 Masłyk, Rafał 150 
Chaniecki, Zbigniew 213 Matys-Popielska, Katarzyna 281, 314 
Cnotkowski, Daniel 202 Mitura, Jakub 76, 103 
Możaryn, Jakub 233, 307 
D Murali, Beddhu 67 
Daszczyński, Tadeusz 298 Mykhalevych, Ihor 76, 85, 103 
Du, Huaiyu 45 
N 
F Nagajek, Damian 167 
Farooq, Muhammad Umar 12 Naruszewicz, Dariusz 298 
Naydonov, Mykhaylo 159 
G Naydonova, Lyubov 159 
Gorbenko, Iryna 76, 103 
Grudzień, Krzysztof 213 O 
Grzeszczuk, Maciej 287 Ordys, Andrzej 307 
Osęka, Laura 202, 222 
H Ostaszewska-Liżewska, Anna 250 
Hameed, Ayesha 307 
P 
J Pabiś-Orzeszyna, Michał 202 
Jaskulska, Anna 150, 176, 287 Piotrowski, K. 56 
Jeanty, Mathias 213 Piotrowski, Krzysztof 167 


© The Editor(s) (if applicable) and The Author(s) 2023 
C. Biele et al. (Eds.): MIDI 2022, LNNS 710, pp. 321-322, 2023. 
https://doi.org/10.1007/978-3-031-37649-8 


322 


Pochwatko, Grzegorz 150, 159, 176, 202, 
222, 287 
Popielski, Krzysztof 281, 314 


R 

Rąpała, M. 56 

Rapata, Michał 167 

Raza, Rana Hammad 12 
Romanowski, Andrzej 213 
Rosén, Anna 93 
Różańska-Walczuk, Monika 260 


S 

Sarp, Salih 111 

Sibilska-Mroziewicz, Anna 281,307,314 
Sjóblom, Lisa 93 

Skorupska, Kinga 150, 176, 287 
Sobecki, Piotr 76, 85, 103 

Sobiech, Franciszek 213 

Sultan, Zamra 12 


Suvilehto, Juulia 93 
Świdrak, Justyna 150, 159, 222 


T 

Tikka, Pia 127 

Tupikowski, Krzysztof 76, 103 
Turchan, K. 56 

Turchan, Krzysztof 167 


Vv 
Volungevičiūtė, Laima 191 


W 

Walczak, Natalia 213 
Wierzbowski, Mariusz 202 
Wołoszyn, Kamil 56, 167 


Y 
Yildirim, Enver 111 


Z 
Zoubi, Mohamad Khir 93 


